Multimodal music description

Ongoing research on audio description has yielded the open-source Essentia C++ library for the automatic description of music signals and technology for improvement and personalisation of algorithms through user feedback (Urbano & Schedl, 2013). However, limitations are noticed in algorithm accuracy (glass ceiling effect), in generalisability/robustness towards any music material, and in methods evaluated by comparing to expert ground truth. There is also a semantic gap between extracted features and expert knowledge. In the field of video description, human-in-the-loop mechanisms have been proposed to assist complex video face clustering tasks in real-world broadcast concert videos (Bazzica et al., 2016). Finally, dataset reliability and statistical indicators of dataset reliability (Urbano et al., 2013, Urbano, 2016) are under active research.

Projected TROMPA advances include the generation of meaningful multimodal descriptors for different types of users, and advancement of automatic description of ensembles (e.g. choirs). With regard to flexibility towards broad audiences, evaluation and adaptation mechanisms based on the crowd will be researched. In terms of data quality, algorithms will be improved to deal with user-generated recordings. Finally, research will be performed on the impact of user variability in the creation of datasets and evaluation of algorithms, and on the assessment of optimal dataset design.



Bazzica, A., Liem, C.C.S., and Hanjalic, A. “Exploiting Scene Maps and Spatial Relationships in Quasi-Static Scenes for Video Face Clustering,” Image and Vision Computing, 2016.

Urbano, J., Marrero, M. & Martín, D. "On the measurement of test collection reliability", Proc. SIGIR, 2013.

Urbano, J. & Schedl, M., “Minimal Test Collections for Low-Cost Evaluation of Audio Music Similarity and Retrieval Systems“, IJMIR, 2(1), pp. 59-70, 2013.

Urbano, J, “Test Collection Reliability: A Study of Bias and Robustness to Statistical Assumptions via Stochastic Simulation”, Information Retrieval Journal, 19(3), pp. 313-350, 2016.