Deliverable 5.3 TROMPA Processing Library

Trompa Processing Library (TPL) responsibility is to offer a communication layer between Use Cases,
CEapi and WP3 technologies. Each time a use case requests any type of data and metadata, this is
done via TPL functionalities which derive from:
❖ The Multimodal Component which allows a Pilot user to browse (and combine) Contributor
Environment (CE) references to public music data, regardless of content type. Moreover the
Multimodal Component provides a graphical interface to browse the CE data.
❖ The GraphQL Interface which provides Pilot developers rich access to the TROMPA dataset,
stored as public data references in the CE database.
❖ Extended API Functionalities which allows the (real-time) definition, creation, completion
and consumption of WP3 technology ‘jobs’ on public music data referenced in the CE
The processing workflow for triggering a WP3 algorithm/task is as follows:
❖ A WP3 algorithm defines a task (algorithm) template in the CE database.
❖ The use case can retrieve the potential tasks from CE and can search a certain descriptor for
a specific piece of data.
❖ If the descriptor of data does not exist, the use case requests the running of a task.
❖ On reception of this request, the Extended API Functionalities component creates a node
for the specific task in the CE.
❖ The WP3 process regularly polls the CEapi for new tasks to the CEapi to get notifications on
when a new task is requested.
❖ On reception of a new task, the WP3 tool retrieves the necessary parameters, validates and
starts the task, then updates the task status in the CEapi.
❖ The tool updates the task status regularly to reflect progress.
❖ Once the WP3 tool is run and processed the requested task, it stores the output in an
external repository, then creates a reference node for the storage location in the CEapi and
updates the task status to complete.
❖ Either by polling or by subscription, the use case receives the ‘complete’ update.
❖ Use case retrieves the result reference and can create additional relations between result
and pre-existing nodes for enriching, semantic interlinking and provenance tracking
The CEapi’s main client interface is an implementation of GraphQL. The internal data model of the CE
is based on base level types of the model and a number of extensions that
fit the needs of the TROMPA project. GraphQL broadly supports three types of functionalities that
are accessible through the GraphQL API interface:
❖ Queries: Queries allow to search data (nodes) in the GraphQL database. The queries are
used to search and retrieve data nodes in a “read-only” mode.
❖ Mutations: Contrary to queries, mutations are queries to add, remove or update data in the
Subscriptions: Allow clients to subscribe to changes in the dataset, for example after a
particular node was updated or a particular type of node was created. Once such a change
event occurs, the client will be notified in real time.
The CE database will contain references to public online musical data in many different formats. The
Multimodal Component assists Pilot developers to retrieve that assorted data and present it to
users in a comprehensive way. To this end, the Multimodal Component offers ‘shortcut’ access to
CEapi GraphQL functionalities plus example User Interface (UI) components that give users access to
these functionalities and enable them to consume the rich TROMPA data set. The Multimodal
Component offers a GUI to search the TROMPA data set as contained in the CE database, applying
the mockups as researched and produced in Task 5.3 - Multimodal Integration of Music Data.
Currently, the only shortcut functionality that was worked out in the Multimodal Component is a
search component that is able to disclose all the public musical data references stored in the CE
TROMPA Processing Library also provides a mechanism by which the technologies of WP3 can be
invoked by the user pilots (WP6) through the Contributor Environment and the GraphQL API. At its
most basic, this integration mechanism offers automation for the following process:
● User Pilot chooses target content, referenced in CE database
● User Pilot user creates a job to run a process on this content
● WP3 Process picks up job
● WP3 Process executes job on target content, creating and storing a result
● WP3 Process writes reference to result in CE database
● User Pilot picks up result
● User Pilot user consumes result
Process jobs are maintained as GraphQL nodes in the database. The generic Component-CE-WP3
interaction solution is based on a compatible data model that can be broken down into
three parts:
● Template Nodes: They are maintained by WP3 developers and used by User Pilots. The
Template Nodes represent generic algorithmic processes, e.g. “the extraction of Pitch Class
Profiles (PCP) features of an audio piece”.
● Instance Nodes: They are created by User Pilots and maintained by CE. Instance Nodes
correspond to specific tasks requested by User Pilots, e.g. “the extraction of PCP features of
the audio piece X”.
● Public Nodes: - They represent the (public) content on which the WP3 process is run (the
audio piece X) and the corresponding results (the PCP features).
Each algorithm must “subscribe” it self to the Contributor Environment. The procedure to do so is
summarized as:
❖ Create a SoftwareApplication node in the CE: This node does not correspond to the specific
algorithm, but rather to the software/library that hosts the specific algorithm.
❖ Create an EntryPoint node in the CE: The entry point corresponds to a specific
algorithm/method to be run.
❖ Create an actionApplication relation between SoftwareApplication and EntryPoint: This
actions defines that this EntryPoint is a part of the SoftwareApplication node.
❖ Create a ControlAction template node: This ControlAction node will be the model for the
‘job’ created for a specific algorithm process requests. Each request will result in a copy of this ControlAction node to be created (instantiated) which will then represent the ‘job’ that
can be acted on and followed.
❖ Create potentialAction relation between EntryPoint and (template) ControlAction: Relates
the ControlAction template to the specific EntryPoint.
❖ Create Property template node: Corresponds to existing nodes in the CE database, probably
referencing a content file at some public repository. These will be the inputs to the WP3
❖ Create PropertyValueSpecification: Corresponds to a scalar parameter (string, number,
on/off checkbox) that needs to be given by the user as ‘settings’ inputs, in order to tune the
algorithm process.
❖ Create object relation between ControlAction and Property/PropertyValueSpecification
template nodes respectively.
Each algorithm is invoked by the user pilots, using the RequestControlAction.
❖ The user pilot makes a RequestControlAction for a specific EntryPoint. The CEapi translates
this request by creating a set of nodes on the basis of the EntryPoint/ControlAction
template and subsequently responds with the thus created ControlAction, including its
unique identifier. With this identifier, the user pilot can then subscribe to the CEapi and and
receive a notification each time thus created ControlAction is updated.
❖ From the WP3 program/algorithm perspective, there are two ways to get notified that a new
task has to be run:
➢ Subscribe to the CEapi on RequestControlAction requests on a specific EntryPoint,
through the websocket: This functionality offers the possibility to handle user
requests for algorithms in real-time.
➢ Frequently check for tasks in the CE: It is possible to query for ControlActions
created on the basis of a certain EntryPoint. In this way, the software developed
under WP3 can check if new tasks have to be run at convenient times or intervals.
This functionality offers WP3 software to handle user requests for algorithms in
Since the algorithm has been invoked, it can update the status of ControlAction item created during
its process until the process is completed:
❖ Once the algorithm receives the job, it can update the status of the ControlAction node to
❖ While the algorithm is running, it can update the status of the ControlAction node in order to
provide more information (e.g. ‘running’) .
❖ Once the process has completed, the algorithm process application should write the result
to a public location and add a reference to this result in the CE database:
❖ With the identifier obtained from the response of the result node (e.g. DigitalDocument)
creation, create a result relation between ControlAction and DigitalDocument:
❖ When the algorithm process application now updates the ControlAction actionStatus to
‘complete’, the process request response cycle is completed.
The process described above provides a flexible framework for communicating WP3 tasks/jobs with
the CE data, and the user pilots. In the perspective of WP3 technologies, for running a specific task,
two different types of components are involved:
❖ Wrapping software: This software is responsible for handling the communication and the
requests from/to the GraphQL interface of the CE, and trigger the specific Task software
needed to run.
❖ Task software: The software that run the actual computations of the task.
In general, these software might be hosted in different servers. This procedure is global (generic) and
can be applied to all of the technologies under WP3. In the main body of the deliverable, we will
provide more details on the WP3 technologies side, e.g. where these technologies will be hosted and
run, and where the results will be saved. In principle, the intention is to run all the Wrapping
Software components in a dedicated server provided by UPF, which are responsible for handling the
communication with the GraphQL interface, the job requests, and trigger and invoking the
appropriate programs. However, where the actual computations will be run depends on the
developers of the individual tasks. In the rest of the executive summary, we provide some details on
the individual tasks and subtasks of WP3.
The technologies under Task 3.2 - Music Description (in correspondence with the Deliverable 3.2
- Music Description) involve tools for extracting low / medium level audio descriptors, which will be
extracted with the use of Essentia software. Regarding high level descriptors that correspond to
Rhythm Descriptors, Music Similarity, Emotion Tag Annotation (sub-sections 2.2.3, 2.2.5 and 2.2.6 of
Deliverable - 3.2 Music Description), Symbolic Descriptors (Section 2.3) and Video Descriptors
(Section 2.4) and we will follow the same approach with the one described in the previous
subsections, with the only difference that the algorithm software will not be Essentia, but inhouse
UPF methods or other open source state-of-the-art methods that will be deployed. Regarding singing
voice analysis, it will be deployed using different algorithms: Voiceful Cloud’s VoDesc, Essentia and
the TROMPA Choir Singing Analysis algorithm.
Regarding Task 3.3 - Audio Processing, The singing synthesis will be integrated using Voctro Labs’
Voiceful Cloud service. The motivation for using this platform is twofold: first, it is an existing
working solution for singing synthesis that already stores its results in S3 servers, publicly accessible
through URLs to which the CE can store references; second, it facilitates the immediate adaptation
of the developed technologies in other uses cases for exploitation that may arise beyond the project.
This Cloud API service will be extended for TROMPA and will consist of two main tasks:
❖ Deploying the new models in the cloud servers. As detailed in Deliverable 3.3 - Audio
Processing, choir singing synthesis has specific requirements beyond those of solo singing
synthesis. During the project, new synthesis models will be created from new recordings for
multiple languages. These new models will be made accessible through the cloud service.
❖ Adapting the API to support the choir case. Currently, the VoSynth API accepts monophonic
input scores in MusicXML format, as well as in other specific .txt and .json formats for
synthesis. For the choir use case, the API will be adapted in order to accept also MEI scores
and to support polyphonic material, i.e. receiving a score containing all the voices to be
synthesized and outputting one audio file per voice.
For Task 3.4 - Visual Analysis of Scores, a machine-readable representation of a musical score can be
generated in one of three basic ways: manual encoding, optical music recognition (OMR) or
conversion from an existing representation. For several purposes within TROMPA involving the
on-screen display of scores we shall be using OMR technology to extract complete or partial
machine-readable score representations in the MEI format from scores saved as PDF or other
graphical formats. We choose to work with open-source programs which can be modified or adapted
to the purposes of TROMPA, which can be run in batch-mode over large collections of music, and which save their results in a form (such as MusicXML) which can be easily converted to MEI for use
within TROMPA. For standard modern musical notation we shall be using the open-source program
Audiveris. Audiveris is generally reliably accurate with good-quality scores in good condition and
well photographed/digitised. For the music-scholars’ use case within TROMPA, we shall also be
working with vocal music from the 16th century. We shall thus use the specialist software Aruspix.
Similar to other WP3 technologies, each of the software that will be used will be assigned to an
EntryPoint. The wrapper software (Figure 5.4) will be (possibly) hosted on the UPF server. Regarding
the computations components, the infrastructure that will be hosted is to be determined in the
future with respect to the computation demands, which depend on the use cases that involve OMR.
Task 3.5 - Alignment of Musical Resources concerns the interlinking of multimodal
representations of a musical expression at different levels of granularity. Various algorithms and
software packages to accomplish this task are described in Deliverable 3.5 - Multimodal Music
information Alignment. For present development, we are focussing our efforts on the MAPS tool
(“Matcher for Alignment of Performance and Score”) under ongoing in-house development at MDW,
which has the advantage of natively supporting MEI. MAPS will be called through a wrapper layer
(“MAPS wrapper”) responsible for accepting new jobs (e.g., performances) from TROMPA users,
registering these within the CE, and spawning MAPS instances to complete the jobs. MAPS instances
are processor intensive and should thus be deployed on a dedicated high-performance server -
potentially at UPF. The MAPS wrapper is a comparatively light-weight web service that could be
deployed separately, e.g. at MDW.
Task 3.6 - Multimodal Cross Linking comprises of two different pieces of software, the user
interface (Authority registration and linking) and the web scraper:
❖ Authority registration and linking: This software has a user interface to create and describe
web-resources and the relations between entities and resource types. For example Wikidata
will be registered as authority that provides Person Entities, Musical Work Entities and other
entities that will be stored in the CE.
❖ Web scraper: The web scraper will retrieve entity information from the in the CE stored
web-resource and finds links to other registered authorities. It will save the referenced
entities in the CE.
The user interface (Authority registration and linking) interacts with contributors and the CE. On the
other hand, the web scraper interacts between the CE and the web resources of registered
authorities. The multimodal cross linking will have use cases where the linked data can be used as
background reference or where data can be enriched for the end user. This is primarily with cases for
music scholars and music enthusiasts. Both pieces of software will run as micro services in a cloud
environment under subscription of Trompa. The authority registration and linking software will
register user input from a web-interface and store the data in the CE. The web scraper takes data
from web-resources of registered authorities and saves the linked data in the CE.
Rather than being a concrete piece of software, TROMPA Processing Library is a collection of
Contributor Environment functionalities that allow the interaction of WP3 components with the CE
data, the different software components that correspond to the various WP3 tasks, and the
organization amongst them. For the next period of the project, we plan to develop the TPL as
❖ Preliminary Development (M13 - 18): By the end of this period we will test all of the
individual components of WP3 and validate the correct communication with the CE:
assignment of tasks from the CE, execution of tasks, storage of data.
❖ First Working Version (M19-M22): By the end of this period we will provide a first full
version of the TPL. This will be delivered 2 months prior to M24 and MS3, where the first
version of the working prototypes of the pilots will be delivered in order to facilitate the
development of the pilots.
❖ Incorporation of Latest WP3 Components (M23-24): TPL will incorporate the final versions
of for Task 3.3 - Audio Processing and Task 3.5 - T3.5 Alignment of musical resources.
❖ Incorporation of Final WP3 Components (M25-M30): TPL will incorporate the final versions
of for Task 3.4 - Visual Analysis of Scanned Scores and Task 3.6 - Multimodal Cross Linking
❖ Final Version (M31-M34): Final adaptations/debugging. There will be an effort to have most
of the components centralised for ensuring sustainability after the end of the project.
Year of Publication
Report Number
TR-D5.3-TROMPA Processing Library_v1
Short Title