Deliverable 5.4 Music Performance Assessment Mechanisms

One of the aims of TROMPA is to formalise expert (musicologists and educators) and crowd (music
enthusiasts) knowledge on various aspects of performances and musical scores in terms of
performance quality (such as intonation and voice quality in case of singing or technical brilliance in
case of instrumental music) and of piece difficulty (i.e., the difficulty of performing a piece as a singer
or instrumental player). Further, TROMPA aims to develop automated models of these kinds of
assessments, developed and systematically validated via human feedback.
Automatic assessment of score difficulty is far from trivial. The notion of “difficulty” is in fact a
complex construct, involving both cognitive-structural aspects (“difficulty of understanding”) and
motoric-physiological aspects (“difficulty of physically realising”). The ability to tackle facets of both
aspects of difficulty are dependent upon (and hence, must be understood relative to) the expertise
and skill level of the performer. Further, the motoric-physiological aspects depend strongly on the
particularities of particular instruments, the tempo of a performance, and on the playing style called
for by the piece or preferred by the individual performer. As such, different experts may disagree in
their assessment of piece difficulty, depending on how they weigh these diverse aspects in their
judgement; and automatic algorithms to determine such a measure must therefore be understood
as coarse-grained abstraction of the many facets influencing the performance difficulty of a
particular score.
Similarly, a performance’s quality (in the sense of “goodness”) is difficult to pin down,
representing a highly subjective notion likely to confound consistent ratings even among human
Section 2 of this deliverable summarises the lower-level descriptors that in aggregate may be
used to approximate notions of quality and difficulty as described above. These descriptors may be
i) obtained explicitly from human judgements obtained through crowd-sourcing, user interactions, or
from sources in the literature; ii) implicitly, from musicians’ behaviours during musical performances;
or iii) they are derived algorithmically from performance audio recordings, MIDI streams, and other
information modalities, which we briefly summarise in Section 3.
In Section 4, we describe mechanisms envisioned to determine these lower-level descriptors
using technologies from TROMPA deliverables D3.2 and D3.5. These comprise:
● User judgements obtained via tools and methods developed in D5.2 (Digital Score Edition)
and D5.5 (Annotation tools). These provide individual ratings of difficulty aspects of a score
(for individual sections or entire pieces) or on specific aspects of a performance, such as
overall performance quality (high, low), expressivity of a performance (expressive,
mechanical), tempo and dynamics judgment (too slow, too fast, too loud, too soft) .
● Expert assessments of difficulty harvested from the pedagogical literature, providing
reference data (“difficulty indices”) for the training of WP3 technologies, as well as metadata
available for consumption by end users.
● Measures of performance “errors” (that is deviations from the notated score) identified
during performance to score alignment (D3.5) and through characterisation of intonation
accuracy and timing deviation (D3.2), providing a crude indication of performance “quality”,
as well as a signature of score difficulty: score sections consistently prone to producing
errors across performances presumably exhibiting greater difficulty than sections that tend
to be performed more accurately.
● Quantifications of individual performer (instrumental or singers) output over time,
investigating the number of rehearsal repetitions, and the rate of performance improvement
across rehearsal sessions, as signatures of piece difficulty.
● Quantifications of score difficulty according to motor-physiological requirements of its
performance (tempo; attack density; hand displacement; fingering; mastery of specialised
performance techniques).
● Quantifications of score difficulty according to cognitive-structural requirements (harmonic
and rhythmic complexity; deviations from key; information-theoretic compressibility of the
● Consequently, quantifications of both motoric-physiological and cognitive-structural fatigue
liable to be produced by the performance of a piece.
● Measures of performance quality determined in aggregate from performances of a
particular musical piece (on the notion, grounded in the literature, that typical performances
along particular musical parameters tend to produce qualitatively better performances).
Finally, in section 5 we summarise performance assessment workflows in terms of the TROMPA data
infrastructure (D5.1) , first describing interactions with the Contributor Environment via the TROMPA
Processing Library (D5.3) before detailing the workflow specific to the instrumental performers and
the choral singers use cases.
Year of Publication
Report Number
TR-D5.4-Music Performance Assessment Mechanisms v1
Short Title