Skip to main content

Video Captioning

Supervisors

Suitable for

MSc in Advanced Computer Science

Abstract

Video captioning means automatically generating a sentence describing what’s happening in a video. Deep learning methods have improved greatly at this task over the last 3-4 years, but the use of natural language to describe a video has several disadvantages. Some work in our lab has proposed using a formal language, that can describe videos in terms of objects and relations between them: for example, “throws(person,ball)” instead of “a person is throwing a ball” (see http://www.cs.ox.ac.uk/publications/publication14259-abstract.html). This project will extend the above paper, which could just mean scaling it to larger datasets with the latest deep learning techniques, or could involve extending the idea to make it more efficient or performant. Advanced Machine Learning is a prerequisite, and proficiency in logic and reasoning could also be useful, depending on the direction the project was taken in.