TY - RPRT AU - Alessandro Bozzon AU - Geert-Jan Houben AU - Jaehun Kim AU - Cynthia Liem AU - Christoph Lofi AU - Ioannis Samiotis AU - Julián Urbano AB - The aim of Task 4.1 – “Crowd-powered Improvement” is to design and implement a framework for the continuous evaluation and improvement of the automatic technologies in WP3, as well as their combination with the crowd to enrich music content. This task builds on top of the workflow definitions in Task 4.4 – “Campaign Design” and the user modeling in Task 4.2 – “Annotators”. This document presents a formal framework and demonstrates its application in the context of Music Emotion Recognition systems developed in Task 3.2 – “Music Description”. Large-scale and sustainable data enrichment is only achievable through the use of automatic technology capable of producing new data descriptions. In a traditional setting, this is achieved in four general phases: annotation of a training corpus; development of systems with this corpus; evaluation of system performance and possible refinement; and enrichment of a target corpus through the application of these systems. However, if a new kind of descriptors is to be used and new technology created for it, or if existing technology has to be adapted to classical music, ground truth data is needed to train systems and evaluate their quality. Crowdsourcing may be a viable alternative to collection annotations, but in the long term, and in an evolving and large-scale setting like TROMPA, sustainability can be at risk. It is therefore very important that the crowd provide annotations where they are needed the most. The setting laid out for TROMPA consists in: annotating a few examples of the target corpus to be enriched; develop or refine systems with these data; train a combination model to estimate pending annotations based on the output from systems, known annotations and other external sources of data; use these estimated annotations to enrich the corpus and estimate system performance; and determine which examples will be annotated next such that they maximize value in a new iteration in terms of system development, enrichment, or evaluation precision. N2 - The aim of Task 4.1 – “Crowd-powered Improvement” is to design and implement a framework for the continuous evaluation and improvement of the automatic technologies in WP3, as well as their combination with the crowd to enrich music content. This task builds on top of the workflow definitions in Task 4.4 – “Campaign Design” and the user modeling in Task 4.2 – “Annotators”. This document presents a formal framework and demonstrates its application in the context of Music Emotion Recognition systems developed in Task 3.2 – “Music Description”. Large-scale and sustainable data enrichment is only achievable through the use of automatic technology capable of producing new data descriptions. In a traditional setting, this is achieved in four general phases: annotation of a training corpus; development of systems with this corpus; evaluation of system performance and possible refinement; and enrichment of a target corpus through the application of these systems. However, if a new kind of descriptors is to be used and new technology created for it, or if existing technology has to be adapted to classical music, ground truth data is needed to train systems and evaluate their quality. Crowdsourcing may be a viable alternative to collection annotations, but in the long term, and in an evolving and large-scale setting like TROMPA, sustainability can be at risk. It is therefore very important that the crowd provide annotations where they are needed the most. The setting laid out for TROMPA consists in: annotating a few examples of the target corpus to be enriched; develop or refine systems with these data; train a combination model to estimate pending annotations based on the output from systems, known annotations and other external sources of data; use these estimated annotations to enrich the corpus and estimate system performance; and determine which examples will be annotated next such that they maximize value in a new iteration in terms of system development, enrichment, or evaluation precision. PY - 2020 TI - Crowd Evaluation Methodologies UR - https://trompamusic.eu/deliverables/TR-D4.1-Crowd_Evaluation_Methodologies.pdf ER -