<?xml version="1.0" encoding="UTF-8"?>
<xml><records><record><source-app name="Bibcite" version="8.x">Drupal-Bibcite</source-app><ref-type>27</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Aggelos Gkiokas</style></author><author><style face="normal" font="default" size="100%">Emilia Gomez</style></author><author><style face="normal" font="default" size="100%">Helena Cuesta</style></author><author><style face="normal" font="default" size="100%">Olga Slizovskaia</style></author><author><style face="normal" font="default" size="100%">Juan Gomez</style></author><author><style face="normal" font="default" size="100%">Lorenzo Porcaro</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Deliverable 3.2 Music Description</style></title></titles><keywords/><dates><year><style face="normal" font="default" size="100%">2019</style></year></dates><short-title><style face="normal" font="default" size="100%">D3.2</style></short-title><urls><style face="normal" font="default" size="100%">https://trompamusic.eu/deliverables/TR-D3.2-Music_Description_v1.pdf</style></urls><abstract><style face="normal" font="default" size="100%">This aim of this task is to develop and integrate technologies for the automatic description of the&#13;
majority of the musical data described and collected in T3.1 Data Resource Preparation. It’s aim is to&#13;
provide music descriptors at various modalities and levels of analysis for the target repertoires in&#13;
order to facilitate the use cases. Although this task primarily deals with audio, we also consider&#13;
symbolic and video music sources. The goals of this task can be summarized to:&#13;
❖ Apply and evaluate existing of the state-of-the-art music description methods in the domain&#13;
of classical music and focusing on the target repertoires.&#13;
❖ Develop new and expand existing methods for music description tailored to the target&#13;
repertoires.&#13;
❖ Facilitate the use cases with robust descriptors.&#13;
❖ Contribute data (descriptors) and algorithms to open repositories such as AcousticBrainz and&#13;
Zenodo in order to facilitate future research on music processing.&#13;
We will use various existing state-of-the-art methods/libraries for the extraction of descriptors. The&#13;
aim of this deliverable is to provide descriptors for three type of music modalities, namely audio,&#13;
symbolic scores, and videos&#13;
We start from the low level audio descriptors, such a spectral and cepstral frame based&#13;
descriptors. At next we will present mid-level audio descriptors, such as information related to&#13;
rhythm (beat locations, tempo), the tonality (key, chords) and music performance (for choir singing:&#13;
intonation, tuning). Next we present higher level music descriptors, such as music similarity and&#13;
emotion classification. These descriptors are more related to human notion about music and must&#13;
be considered more subjective from the mid-level features. At next we present the descriptors&#13;
derived from the symbolic representations of music (MEI, MusicXML, MIDI) , and finally we will&#13;
describe descriptors derived from music videos (concerts, rehearsals etc). These descriptors fuse&#13;
information from both audio and video, and thus are considered as multi-modal descriptors.&#13;
Essentia [1] will be used as the as the main framework for extracting state-of-the-art audio&#13;
features. Apart from Essentia, we will deploy other state-of-the-art methods of music description.&#13;
These methods will be appropriately selected in order to meet TROMPA use case requirements, and&#13;
will be adapted and further developed. Whenever possible, these methods will be contributed to&#13;
Essentia, otherwise will be provided and integrated as individual open libraries.&#13;
The methods that will be developed and the descriptors that we will be extracted are the&#13;
following:&#13;
❖ Low-level audio descriptors: We will use Essentia as the tool to extract these descriptors&#13;
which among others are band energies (bark, mel, ERP), cepstral descriptors (mfcc, bfcc),&#13;
spectral moments and time domain features (envelope, ZCR)&#13;
❖ Harmonic-Tonal descriptors: Include various representations such as the chromagram,&#13;
Harmonic Pitch Class Profile (HPC), chord descriptors, key, tuning. These descriptors will be&#13;
extracted and evaluated in the context of TROMPA use cases but we do not intend to further&#13;
investigate and research the development of these descriptors. We will contribute to the&#13;
SoA in terms of evaluating these descriptors described in large amounts of western classical&#13;
music data. Moreover we may contribute to the SoA with new datasets (see next subsection&#13;
Human Annotations / Human Data). The harmonic/tonal descriptors can be potentially use&#13;
in all cases that involve audio files. To this end, we have identified the following potential&#13;
use of the rhythm descriptors:&#13;
➢ Music Enthusiasts: Harmonic/Tonal descriptors can be used to facilitate a music&#13;
recommendation/music similarity engine..&#13;
➢ Instrument Players: Harmonic/Tonal descriptors can be used to analyze existing or&#13;
new performances of instrument players.&#13;
➢ Choir Singers: As in the instrument players use case, we can use tonal descriptors&#13;
(e.g. tuning) to characteristics of choir singing performances.&#13;
➢ Music Scholars: The chord descriptors can facilitate musicological research, e.g.&#13;
retrieving works with similar chord progressions.&#13;
❖ Rhythm descriptors: Rhythm descriptors contain explicit information about the rhythmic&#13;
content of a music piece, such as tempo, beats, tempo curves (tempo fluctuations), time&#13;
signature, rhythm tags and meter. We will use and adapt existing state-of-the-art methods&#13;
of rhythm processing methods will be evaluated in the scope of TROMPA and if the adopted&#13;
methods do not prove to be efficient, we will develop new methods. The rhythm descriptors&#13;
can be potentially use in all cases that involve audio files:&#13;
➢ Music Enthusiasts: Rhythm descriptors can be used to facilitate a music&#13;
recommendation/music similarity engine.&#13;
➢ Instrument Players: Rhythm descriptors such tempo fluctuations can be used to&#13;
analyze existing or new performances of instrument players.&#13;
➢ Choir Singers: As in the instrument players use case, we can use rhythm descriptors&#13;
to analyze rhythm characteristics of choir singing performances.&#13;
❖ Singing voice analysis: Singing voice analysis involves extracting information about the&#13;
vocals of a music piece. Singing voice may appear in music pieces in several ways:&#13;
accompanied or a cappella, solo or ensemble, e.g. choirs. Many expressive properties of&#13;
singing voice can be extracted, including (but not limited to) pitch curves (curves showing&#13;
the evolution of the fundamental frequency in time), intonation (accuracy of pitch in&#13;
singing), degree of unison (the agreement between all the voice sources) and&#13;
synchronization. This particular task is primarily related to the Choir singers use case.&#13;
Moreover, we will contribute to the state-of-the-art in multi-pitch estimation, building new&#13;
models trained using choral singing data and we will develop a framework capable of&#13;
extracting intonation descriptors given an audio input and an associated score. In addition,&#13;
and taking advantage of the human-annotated data that will be gathered in the scope of&#13;
TROMPA, we plan to contribute to the state-of-the-art of singing performance rating,&#13;
combining low-level and pitch descriptors with annotations.&#13;
❖ Music similarity: In the context of TROMPA project, we will make use of content and&#13;
context-based models for retrieving similarity measures. In particular, we will focus on&#13;
understanding the role of these measures when embedded in more complex architectures,&#13;
such as Musical Recommender Systems. Indeed, we will make use of the notion of similarity&#13;
for understanding and characterizing the complex and heterogeneous nature of the&#13;
Western Classical Music repertoire, we will compare different musical repertoires, and we&#13;
will create listening experiences for the users which can enhance the discovery of the&#13;
European Musical heritage. Music similarity estimation can be potentially used in the&#13;
following scenarios:&#13;
➢ Music Enthusiasts: help the users in the discovery of Western Classical Music.&#13;
➢ Instrument Players, Choir singers: help musician and singers during the learning&#13;
process in identifying musical pieces which most can fit their education, basing on the&#13;
training level and personal musical taste.&#13;
Emotion Tag Annotation: In the context of the TROMPA project, we will make use of the&#13;
categorical approach to annotate musical excerpts with emotion ratings. We will study the&#13;
agreement in emotion annotation and consider also personalized models. In order to analyze&#13;
annotation agreement of emotional tags, we will ask the user to provide annotations in&#13;
emotion ratings such as transcendence, peacefulness, power, joyful activation, tension,&#13;
sadness, anger, disgust, fear, surprise, tenderness on a likert scale.&#13;
❖ Symbolic descriptors: Symbolic descriptors will be used in facilitating tasks related to the&#13;
automatic assessment of music pieces for the choir singers and the instrument players use&#13;
cases. Moreover we will extract baseline symbolic descriptors that can be potentially&#13;
exploited in other use cases, such as the music scholars use case. We will investigate which&#13;
score features of a music piece are relevant to its playing difficulty. To do so, we will need&#13;
several types of data to train and evaluate our model including symbolic scores, annotations&#13;
about the difficulty of the pieces, and audio recordings of several renditions of the pieces&#13;
performed by different people. The symbolic descriptors are initially focused on the choir&#13;
singers and instrument players pilots. However they can be potentially used in other pilots,&#13;
such as the music scholars pilots. This possibility will be investigated in the next months of&#13;
the project and will be reported in detail in the next version of this deliverable.&#13;
❖ Video descriptors: Video descriptors are aimed to provide additional semantic information&#13;
related to video recordings of musical performances. In the scope of the project, we will&#13;
focus on the following tasks revealing the potential of using video data:&#13;
➢ Video tagging: general-purpose video tagging in the domain of musical&#13;
performances, providing frame-level and video-level labels from a pre-defined&#13;
ontology.&#13;
➢ Musical instrument detection: providing a position of an object in a form of a&#13;
rectangular bounding box.&#13;
➢ Automatic object segmentation: localizing objects from a pre-defined ontology as&#13;
an enclosed free-form area at a frame level.&#13;
Video descriptors can be used in many use cases which involve video, to name a few:&#13;
➢ Instrument Players: showing the fingering charts upon a video recording.&#13;
➢ Choir Singers: counting the number of singers.&#13;
➢ Music Enthusiasts: providing a more detailed transcription for music performances,&#13;
instrument-to-instrument navigation in music videos, highlighting&#13;
playing/non-playing instruments in videos.&#13;
The automatic description tools that act on a file input (e.g. music file or video file) will be made&#13;
available as executable programs or will be integrated to Essentia, allowing us to programmatically&#13;
call the tools and retrieve the descriptors. We plan to develop a management tool that will allow us&#13;
to automatically retrieve content described in the CE, perform some computation on that content,&#13;
and then store descriptors or some other type of generated data, making it available again in the CE.&#13;
For the tools that do not operate on a simple input/output basis we plan to provide documentation&#13;
and software libraries for the developers of these tools to be able to easily retrieve data from the CE&#13;
and submit descriptors or other results for other members of the consortium to use. We will&#13;
promote the use of common file formats to store the descriptors computed by the tools presented&#13;
in this document. A final decision for the data format will be described in the next version of this&#13;
deliverable. We plan to publicly host the descriptors computed by these tools so that they can be accessed by other members of the consortium and the public. The location of these descriptors will&#13;
be able to be obtained by querying the CE.</style></abstract></record></records></xml>
