Head-Related Transfer Function Upsampling Using an Autoencoder-Based Generative Adversarial Network With Evaluation Framework

Accurate head-related transfer functions (HRTFs) are essential for delivering realistic 3D audio experiences. However, obtaining personalized, high-resolution HRTFs for individual users is a time-consuming and costly process, typically requiring extensive acoustic measurements. To address this, spatial upsampling techniques have been developed to estimate high-resolution HRTFs from sparse, low-resolution acoustic measurements. This paper presents a novel approach that leverages the spherical harmonic domain and an autoencoder generative adversarial network to tackle the HRTF upsampling problem. Comprehensive evaluations are conducted using both perceptual models and objective spectral metrics to validate the accuracy and realism of the upsampled HRTFs. The results show that the proposed approach outperforms traditional barycentric interpolation in terms of log-spectral distortion, particularly in extreme sparsity scenarios involving fewer than 12 measurements. These results go some way to justifying that the proposed autoencoder generative adversarial network approach is able to create high-quality, high-resolution HRTFs from only a few acoustic measurements, helping pave the way for more accessible personalized spatial audio across a range of applications.

Keywords

4006 Communications Engineering

Journal Title

Journal of the Audio Engineering Society

Journal ISSN

1549-4950

Volume Title

73

Publisher

Audio Engineering Society

Publisher DOI

https://doi.org/10.17743/jaes.2022.0218

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International

Collections

University of Cambridge Research Outputs (Articles and Conferences)