Skip to main content

Efficient Synergistic Acoustic Event Classifier Algorithm and Diffusion Model-based Speech Enhancement Method

Yiyuan Yang ( University of Oxford )

In this seminar, two acoustic-related tasks, acoustic event classification and speech enhancement, will be discussed.

For the acoustic event classification task, we proposed "A Synergistic Spectral and Learning-Based Network for Efficient Bird Sound Classification". In detail, prevailing methods typically demand extensively labelled audio datasets and highly customized frameworks, imposing substantial computational and annotation loads. In this study, we present an efficient and general framework, which combines spectral and learned features to identify different bird sounds. Encouraging empirical results gleaned from a standard field-collected bird audio dataset validate the efficacy of our method in extracting features efficiently and achieving heightened performance in bird sound classification, even when working with limited sample sizes. Furthermore, we present three feature fusion strategies, aiding engineers and researchers in their selection through quantitative analysis.

For the speech enhancement task, we propose an efficient speech enhancement framework based on an improved diffusion model. This endeavour targets the extraction of clear speech signals from audio recordings plagued by acoustic interference. Unlike previous methods that primarily rely on generative models to overcome the limited generalization capabilities of discriminative models, our approach addresses two critical shortcomings in the field: subpar real-time processing capabilities and ineffective handling of data with a low signal-to-noise ratio (SNR). By innovating with an improved diffusion model at its core, our framework is uniquely equipped to facilitate robust speech enhancement in challenging acoustic environments.

 

 

Share this: