Blind Estimation of the Subband Reverberation Time

Abstract

Reverberation, in psycho-acoustics and acoustics, refers to the prolongation of sound in an enclosure after the sound source is switched off. A common measure of this property is the reverberation time (RT). Blind estimation of the subband RT is still a challenging task as the determination of the ground truth using the Schroeder method is already a demanding and tedious task. The aim of this research work was to investigate known and novel model-based approaches for the estimation of the subband RT. In a first step, a recently presented approach to estimate the subband RT by extrapolating the RTs for the higher subband from the estimates of the lower subbands [2] has been investigated and compared with the related approaches presented in [1]. It turned out that the model described in [1] achieves the lowest average error per subband for all the tested room impulse respones. The development of a model for the subband RT requires large databases with ground truth data for the subband RTs. However, many databases with room impulse responses (RIRs) do not contain the ground truth data. Therefore, an approach to estimate the subband RT from a given RIR by means of the Schroeder method has been developed. In a next step, two model-based approaches [3] using discrete cosine transform (DCT) filterbanks were evaluated. Based on this and previous results, a polynomial regression model for the subband RTs has been investigated. The analysis results show that, it’s possible to approximate a model using DCT filterbanks, but it highly depends on the acoustic properties of the room and fullband RT range, which makes it quite tough to find an universal regression model with the known approaches. The mean absolute average error per subband for a 30 channel DCT filterbank considering all fullband RT range was 0.0400s for higher subands starting from the subband number 15 and 0.1010s considering all the subbands. Finally, the use of artificial neural networks (ANNs) for the estimation of fullband RT from subband RTs as well as the estimation of the subband RT from a given RIR have been investigated. It turned out that the ANN model using a 30 channel DCT filterbank to estimate the fullband RT from subband RTs acheives 0.0048s mean square error (MSE) and for an octave filterbank the MSE was 0.0062s. On the other hand, the mean square error per subband for the estimation of the subband RT from a given RIR was 0.0320s using a 30 channel DCT filterbank

Visualization of predicted Fullband T60s for DCT and Octave filterbaks

Prediction results for fullband T60 using DCT filterbanks (Achieves 0.0048s mean square error)

Prediction results for fullband T60 using Octave filterbanks (Achieves 0.0062s mean square error)

Visualization of predicted Subband T60s for DCT filterbaks

A demo prediction results for subband T60 using DCT filterbanks (Achieves 0.0320s mean square error overall

S. Li, R. Schlieper, and J. Peissig, “A hybrid method for blind estimation of frequency dependent reverberation time using speech signals,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, May 2019.
H. W. Löllmann, A. Brendel, W. Kellermann, and P. Vary, “Single-channel maximum-likelihood T60 estimation exploiting subband information,” Proceed- ings of the ACE Challenge Workshop, pp. 1–5, October 2015.
M. Jeub, “Joint dreverberation and noise reduction for binaural hearing aids and mobile phone,” PhD thesis, published at RWTH Aachen University, 2012.

For details discussion and different model architectures and algorithms used in this work, please read the report and see the presentation .