Crowd Evaluation Methodologies

Author	Alessandro Bozzon Geert-Jan Houben Jaehun Kim Cynthia Liem Christoph Lofi Ioannis Samiotis Julián Urbano
Abstract	The aim of Task 4.1 – “Crowd-powered Improvement” is to design and implement a framework for the continuous evaluation and improvement of the automatic technologies in WP3, as well as their combination with the crowd to enrich music content. This task builds on top of the workflow definitions in Task 4.4 – “Campaign Design” and the user modeling in Task 4.2 – “Annotators”. This document presents a formal framework and demonstrates its application in the context of Music Emotion Recognition systems developed in Task 3.2 – “Music Description”. Large-scale and sustainable data enrichment is only achievable through the use of automatic technology capable of producing new data descriptions. In a traditional setting, this is achieved in four general phases: annotation of a training corpus; development of systems with this corpus; evaluation of system performance and possible refinement; and enrichment of a target corpus through the application of these systems. However, if a new kind of descriptors is to be used and new technology created for it, or if existing technology has to be adapted to classical music, ground truth data is needed to train systems and evaluate their quality. Crowdsourcing may be a viable alternative to collection annotations, but in the long term, and in an evolving and large-scale setting like TROMPA, sustainability can be at risk. It is therefore very important that the crowd provide annotations where they are needed the most. The setting laid out for TROMPA consists in: annotating a few examples of the target corpus to be enriched; develop or refine systems with these data; train a combination model to estimate pending annotations based on the output from systems, known annotations and other external sources of data; use these estimated annotations to enrich the corpus and estimate system performance; and determine which examples will be annotated next such that they maximize value in a new iteration in terms of system development, enrichment, or evaluation precision.
Year of Publication	2020
URL	https://trompamusic.eu/deliverables/TR-D4.1-Crowd_Evaluation_Methodologies.pdf
	BibTeX EndNote X3 XML Endnote tagged RIS

This project has received funding from the European Union's Horizon 2020 research and innovation programme H2020-EU.3.6.3.1. - Study European heritage, memory, identity, integration and cultural interaction and translation, including its representations in cultural and scientific collections, archives and museums, to better inform and understand the present by richer interpretations of the past under grant agreement No 770376.