Back to Publications
arXiv preprint 2025
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
Shunian Chen*, Xinyuan Xie*, Zheshu Chen*, Liyan Zhao, Owen Lee, Zhan Su, Qilin Sun, Benyou Wang
Abstract
A large-scale dataset for fine-grained audio captioning with multimodal contextual fusion.