AudioFab: Building A General and Intelligent Audio Factory through Tool Learning

Currently, artificial intelligence is profoundly transforming the audio domain; however, numerous advanced algorithms and tools remain fragmented, lacking a unified and efficient framework to unlock their full potential. Existing audio agent frameworks often suffer from complex environment configurations and inefficient tool collaboration. To address these limitations, we introduce AudioFab, an open-source agent framework aimed at establishing an open and intelligent audio-processing ecosystem. Compared to existing solutions, AudioFab's modular design resolves dependency conflicts, simplifying tool integration and extension. It also optimizes tool learning through intelligent selection and few-shot learning, improving efficiency and accuracy in complex audio tasks. Furthermore, AudioFab provides a user-friendly natural language interface tailored for non-expert users. As a foundational framework, AudioFab's core contribution lies in offering a stable and extensible platform for future research and development in audio and multimodal AI. The code is available at https://github.com/SmileHnu/AudioFab.

Keywords

46 Information and Computing Sciences, 4602 Artificial Intelligence, Networking and Information Technology R&D (NITRD), Bioengineering, Machine Learning and Artificial Intelligence

Journal Title

Proceedings of the 33rd ACM International Conference on Multimedia

Conference Name

Proceedings of the 33rd ACM International Conference on Multimedia

Publisher

Association for Computing Machinery (ACM)

Publisher DOI

https://doi.org/10.1145/3746027.3756869

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International

Sponsorship

National Natural Science Foundation of China

Collections

University of Cambridge Research Outputs (Articles and Conferences)