Skip to main content

Keyword searching audio/video files

Supervisor

Harjinder Lallie
(https://warwick.ac.uk/fac/sci/wmg/people/profile/?wmgid=856 https://warwick.ac.uk/fac/sci/wmg/people/profile/?wmgid=856)

Suitable for

Mathematics and Computer Science, Part C
Computer Science, Part C
Computer Science and Philosophy, Part C
Computer Science, Part B

Abstract

Digital forensic investigators often search for the existence of keywords on hard disks or other storage medium. Keywords are easily searchable in PDF/word/ text/other popular formats, however, current digital forensic tools do not allow for keyword searching through movies/audio. This is essential in cases which involve dashcam footage, recorded conversations etc. The aim of this project is to produce a tool which auto-transcribes audio data and then performs a keyword search on the transcribed file – pinpointing the point(s) in the file where the keyword(s) appear. You will be expected to develop the solution using Python, and if possible, integrate the solution with Autopsy, an open-source digital forensic tool.

Prerequisite. Additional support can be provided by providing you with access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. You are likely to use the Python SpeechRecognition and possibly the PyAudio libraries. For convenience and demonstration of practicality, you may want to integrate the solution with the open-source forensics tool – Autopsy – and hence will need to develop a good understanding of this tool particularly the keyword search facility.