Developing Traditions of Music Notation and Perf... on the Web

Developing Traditions of Music Notation and Performance on the Web

written for

WedelMusic 2002

which took place at the

Fraunhofer Institute for Computer Graphics
Darmstadt, Germany
from 9th - 11th December 2002

Additions 2009
Sonal Atoms and the complete transcription
described in the second part of this paper,
are now available at this website here.
I have also added audio examples,
where appropriate, below.

Abstract
Keywords

1 Introduction: The cultural context

2 A proposed software architecture

2.1 The top level components
2.2 Developing written music using computers

2.2.1 The score/recording
2.2.2 The agent (libraries)

2.3 Event analysis (‘chunking’)

2.3.1 The transcription of durations
2.3.2 Symbol overloading

3 General remarks

3.1 Music notation as a programming language
3.2 The development of temporal styles
3.3 Relations to Artificial Intelligence

4 Transcribing Sonal Atoms by Curtis Roads

4.1 The project's background
Sonal Atoms (audio) [added 2009]
4.2 Project objectives
4.3 Stage 1: Chunking and the basic event classes
4.4 Stage 2: The final score
4.4.1 Timbre classes
4.4.2 Additional symbols
Example - page 3 of Sonal Atoms
Example - page 10 of Sonal Atoms

5 Conclusions

References to books
Appendices
A footnote about the conference
The printed proceedings

Abstract

There are many living traditions of written music, each of which consists of evolving, high-level symbolic notation and performance practices. A finite, universal standard for music notation and its performance is therefore neither possible nor desirable. The web should, on the contrary, allow written and aural traditions to develop freely and independently.
The aim of this paper is to describe some concepts which would allow different musical traditions to evolve in parallel, and to suggest some orientation for programmers wanting to write flexible tools for a developing culture.
After a brief introduction, an analysis of the top level mechanisms by which written and aural traditions evolve is presented. This high level description is subsequently compared to a proposal for a general architecture for music editing software. This architecture, and the relation of its symbol libraries to agent technologies, is covered in greater detail in The Writing of Style Libraries for Music Notation and Performance.
A concrete example is then provided, showing how the author began to develop one such library of symbols while transcribing a piece of electronic music (Sonal Atoms by Curtis Roads).

Keywords

space, time, space-time, context, symbol, meaning, level, agent

1 Introduction: The cultural context

There is currently a conceptual gulf between symbol-based music notation editors such as Finale and Sibelius, and programs such as Pro Tools or IRCAM’s AudioSculpt which use space-time representations. The former are used to write music for human performers, the latter to control synthesizers and recording studios.
Standard music notation is in wide use, but its relation to real events is difficult to program or to control. Space-time representations are relatively easy to implement, but it is difficult to encapsulate logical concepts or add additional information to such diagrams, so they do not encourage the kind of high-level structural thinking characteristic of a developing musical tradition.
Both these forms of representation have their roots in the dualistic western 19th century time paradigm (real time = clock time + expressivity). This paradigm has strong ties with western philosophy and the western music tradition of the past 300 years or so, but it is not appropriate for describing anything else. Its collapse at the start of the 20th century marked the end of a long period of development, and produced a crisis in written western music which has still not been overcome. Classic accounts of the mid 20th century composers' time-notation paradigm can be found in Boulez P.: Penser la musique aujourd'hui [1], and Stockhausen K.: ...wie die Zeit vergeht... [2]. My own view of the historical context is explained in more detail in The Notation of Time (1985), and Inherited Problems and a Proposed Solution (2002).

In spite of their considerable power, there are many kinds of music which the standard music notation editors can only accomodate with difficulty (for example: Gregorian Chant, Elizabethan Lute music, 16th century Italian Madrigals, 20th century Avant-Garde graphics, unknown 21st century developments, Japanese, Indian and other non-western musics). Not only do working procedures and file formats become unnecessarily complicated in such cases, but everything has to be conceived as an ad hoc change to some suspect 19th century concept. There is no way to embody a special tradition of notation and performance practice in a piece of software (a library) which could be used to develop those concepts directly.

2 A proposed software architecture

2.1 The top level components

The above diagram shows the general way in which I think written musics have developed. There are three main areas: time, space-time and space. The time area contains events. (Events contain information which is associated with a unique moment in time.) The space area contains objects. (Objects contain information which is independent of time. The information stored in an object can be copied exactly - it is independent of any physical decay of the medium.) The space-time area contains entities which involve both space and time.
The performer/writer, who is at the top middle of the space-time area, can create events (in the time area) using any hard- and software which may be available. This could be anything from a single musical instrument to a large modern studio having many synthesisers, samplers, microphones, performers, computers etc. Events in time are a "performance", which is an instantiation of general rules within a developing performance practice.
The performer/writer can also create objects (in the space area) such as a musical score or a physical recording of a performance. As with events (temporal instantiations), such objects contain instantiations of evolving symbol definitions at a particular point in time. The intantiations have unique local values which are stored in the object.
Written scores contain unique spatial values for the positions of symbols, but the temporal meanings of those symbols are only defined generally (i.e. non-uniquely) in a tradition of performance practice stored in the minds of those who have learned it. The information contained in physical recordings can, however, be turned directly into temporal events using a piece of standardised machinery. Recordings are cultural reference points, independent of the passage of time.
The performer/writer (the developer of a style) is part of a double feedback loop. A personal stylistic evolution occurs as unique circumstances are explored by trial and error. Those aspects of events which are thought to be important are fixed in space by instantiating some suggestive glyph from an evolving set of conventions.
Alternatively, going the other way round the diagram, the score can be edited (object creation, writing) - whereby default spatial values are taken from the evolving personal notation practice - and performed using the corresponding performance practice so that the events can be heard (listening, event perception). This double feedback loop was I think responsible for the development of writing in general, and western written music in particular. This extraordinary cultural achievement was the result of harnessing deep connections between ears (time) and eyes (space). The collapse of the 19th century notation paradigm, and the associated failure to cultivate these connections, are I think the main reasons for the current lack of development in written music.
BBeneath the performer/writer is a block containing the evolving, externalised tradition of performance and notation practice. The conventions used for relating events to objects are developed by individuals, but taught to others so as to create a tradition. The process continues by recursion, beyond the lifetimes of individuals, as students become teachers.

2.2 Developing written music using computers

2.2.1 The score/recording

This consists of a GUI having a set of nested windows. Traditionally, of course, scores have been written on two-dimensional paper, and "performance practice" has been developed and stored in the minds of composers and performers. But we are dealing here with music represented on computer screens, and the GUI can consist of nested, editable windows. This simplifies the process of defining the symbols considerably. One can start with very basic symbols, and add new ones as required.
The upper windows contain high-level symbolic information about the events in a particular piece of music. The symbols are the names of the windows in the levels below. The highest analog, machine level (A1) contains the temporal information necessary for a particular performance. Lower machine levels may exist, for example to change the sensitivity of some device.
GGiven a tradition of performance practice, the symbolic levels of the GUI can be read and converted into events without the use of mechanical aids (i.e. ignoring the information in the levels below the threshold of perception). It is therefore quite feasible to design such a symbolic level window to be printed out and performed from a paper copy - for example in the notation style of Mozart. The "Mozart" notation style would not need to include complex, 20th century symbols. There is a finite set of symbols in Mozart's notation style.
TThe score distributed over the Web might be a traditional, two-dimensional, one-level diagram or the complete, inter-connected, multi-level set of diagrams complete with sufficient information to allow a particular performance to be reconstructed.
If the user is in posession of an agent which can perform the symbolic levels (the symbols and the agent must use the same notation conventions) then the agent can learn from the new performance information and/or ignore it in order to perform the score �creatively� on its own.

2.2.2 The agent (libraries)

For each of the window levels in the score, there are two corresponding, interdependent libraries defining the general characteristics of each symbol defined at that level. One of these defines the symbols' spatial behaviour (the notation practice), the other defines the contents of the subordinate window for that symbol. Temporal behaviour (performance practice) is stored as the relationship between windows S1 and A1.
WWhen a user adds a symbol to S1, the software looks at the library definition of the symbol to see how it should behave on screen (e.g. centering a staccato dot above a notehead) and what effect that has on the performance (the staccato dot might have a default effect on the envelope curve). In such a case, the default meaning of the staccato dot need not be accepted. The user can edit the precise meaning at that point in the score by editing the envelope in A1 . The local meaning of a symbol can change according to context. The spatial behaviour of staccato dots has remained practically identical for many composers over the past 400 years, but even the general, default meanings of those dots changes considerably.
Feedback is possible between the scores and libraries. This can happen while editing different scores which use the same conventions and performing each of these in different ways. Such feedback can, over a period of time, be utilised to turn the libraries into performing agents. Editing the temporal characteristics of particular instances of symbols, taking local context into account, can gradually teach the libraries how to play a score convincingly in a particular style.

2.3 Event analysis (‘chunking’)

Chunking is the process whereby each event in a series of events (represented in a space-time diagram) is given a separate symbol (or name), and the event's name is connected to the space-time information at a lower level.
The top part of the following diagram is a typical space-time representation of a series of events. As far as a machine is concerned, this is a single, undifferentiated curve. People however instinctively break such curves into manageable chunks. Such chunks can be labeled just by putting a dot on each peak (the dot can be oval, like a notehead).

2.3.1 The transcription of durations

The lengths of the ‘events’ can be classified, irrespective of the existence of a tempo, using a logarithmically constructed trammel. Using the classic duration symbols means that legibility can be improved later (horizontal spatial compression, use of beams).
It would be useful for the standard notation of tempoed music to be a special case here: A histogram can always be constructed from the lengths of the events, so if the diagram represented a piece of classical music with durations having proportions ca. 2:1, then it would be very easy to construct a trammel to produce the original notation. If there are no such proportions in the original diagram, the user might relate the trammel to the shortest length, or try to ensure maximum differentiation in the resulting transcription.

2.3.2 Symbol overloading

All symbols have freely definable meanings in the proposed libraries. Conventional meanings can be stored in standard libraries, but this does not preclude unconventional uses and/or symbol overloading. Here, for example, the ‘pitch’ symbols are being used as slider knobs to control dynamic and filtering.

3 General remarks

3.1 Music notation as a programming language

Adding a staccato dot to one of the symbolic levels of the GUI would change the envelope stored in the related notehead. This is an example of a general rule: When a user changes or edits an object in one of the symbolic level windows in the GUI, the software has to take the local context of that change into account in order to generate default temporal values for the known parameters of all the related symbols in the vicinity.
The precise value of each individual parameter in each individual symbol is unique, and can relate both actively and passively to the symbol's local context in ways defined in the library. Intuitively understanding the "correctness" of the default value which the software provides, is to recognise the style defined in the library.
Notice that because the default values of particular instantiations of symbols are discovered by performing calls to functions, there are no limits to the complexity of the spatial or temporal style.
IIt is even possible to think of music notation as a two dimensional, event-oriented programming language: In standard music notation, chords such as

are constructed, using four lists of symbols attached to a central core:

The core contains noteheads and the duration class symbol. listL and listR contain as many sub-lists as there are noteheads in the core.
Such a chord is an event constructor in the 2-dimensional language. In 1-dimensional languages such as C++ [3], the constructor would be written as a sequence of characters something like this:

            event(core, listT, listL, listB, listR);

The logical x-position (horizontal position) of a 2-dimensional chord is independent of the shapes and sizes of the symbols in its argument lists, and there are well-known (if complex) spatial rules for maximizing legibility when combining chords both in sequence and in parallel on a page.
High-level, 2-dimensional languages like music notation could be developed for use in multiprocessing situations. Orchestra scores aree the prototype for coordinated parallel processing.

3.2 The development of temporal styles

There are two levels at which an editor's symbol libraries can be defined:

Level 1: Defining a completely new symbol (a name ) together with its subordinate window (its definition or set of control structures). Non-expert users could skip this level of the software. Some composes are however already using environments such as IRCAM's OpenMusic which allow users to drag-and-drop abstract control structures to create custom control sets.
Level 2: Setting the preferences for the default values of a given set of controls. This could, for example, be done by demonstrating the values in real time - a method which would be especially important if the user cannot understand or directly edit the definitions in the library.

Default values for each symbol's known control set could also be related to a statistical analysis of many instances of that symbol in one or more performances of a given score. The default value could, for example, easily be related to a Gaussian distribution of already existing values. A multi-level score of a piece of classical music could be fed with unlimited numbers of recordings in order to seed its libraries with a classical performance style.
Some libraries might learn a user's preferences by continuously observing how that user sets or changes particular values in the score, or allow users to edit Gaussian distributions directly...

3.3 Relations to Artificial Intelligence

This subject is treated more fully in The Writing of Style Libraries for Music Notation and Performance, but it is worth noting that the concepts described here interact with many music- and time-related projects.
AI research into performance practice and expressivity in particular kinds of music may well suggest algorithms which could be used to program the temporal meanings of symbols. This in turn would create written traditions of performance practice (software is a form of writing) which would especially benefit lesser known performance traditions and New Music. Individuals would be able to learn a new style realistically in their own time, making expensive group rehearsals more productive.
AI algorithms could also be used to develop expressive speech for automata.

4 Transcribing Sonal Atoms by Curtis Roads

This project is described here as an example, showing how one particular library of symbols began to develop.

4.1 The project's background

After reading my essay The Notation of Time (which is on my website, but which is no longer available in print), Curtis Roads invited me to give a lecture at the CREATE Institute at UCSB during February 2002. This gave me a chance to provide an updated version of my thoughts on the background to the crisis in 20th century music notation (Inherited Problems and a Proposed Solution). As a follow-on project, he commissioned me to create a symbolic score of his piece Sonal Atoms which is purely electronic music, made using particle synthesis methods.
The source materials he sent me included an audio CD of the piece, some background information about particle synthesis, and some printouts of AudioSculpt screenshots of the piece.
AudioSculpt provides several tools which make transcribing easier (replay of user-defined sections, zoom functions for the visual display, a tuning-fork which plays a sine-wave with the amplitude and frequency of any point in the sonogram).
While we corresponded about certain details, he kindly allowed me to have the last word in any decisions. I was very happy to accomodate him if this did not conflict with the project objectives outlined below. We were working to a finite budget, so I had to ignore certain aspects of the composition (in particular the spatial distribution of the sounds) and restrict the amount of information I could include about the pitches. (I usually just notated the main perceived pitch in an event.)

Addition 2009
Sonal Atoms

4.2 Project objectives

The project objectives can be summarised as follows:

To create a symbolic representation of the piece.
The transcription should be 2-dimensional (printable on sheets of paper), but conceptually a printout of one of the windows of the GUI described above.
To proceed as if training an agent capable of making automatic transcriptions of similar pieces - all steps should therefore, in principle, be programmable.
To investigate unforseen problems and isolate further generalisations.

4.3 Stage 1: Chunking and the basic event classes

The first stage of transcription was to decide which events could be named. I therefore started by aligning noteheads (with their duration class) directly under the sonogram while listening to the piece and looking at the sonogram. I used a trammel having the following duration class values (space-seconds in the sonogram):

0 <

<= 0.2 <

<= 0.4 <

<= 0.8 <

<= 1.6 <

<= 3.2 <

<= 6.4

After annotating a few pages, it became clear that the events could be subclassed into Points, Lines and Clouds. (The composer had already mentioned these categories - Sonal Atoms, is one of a cycle of pieces called Point, Line, Cloud. ) I decided that:

All events which last less than 0.1 sec are Points.
Lines have a perceptible pitch, and appear in the sonogram as horizontal or diagonal lines.
Clouds have no pitch.

Addition 2009
The sound of the following example:

Stage 1: naming events

4.4 Stage 2: The final score

Before transferring this basic symbolic information to the final score - where spacing is used not to denote timing, but to improve legibility - various other information was added:

4.4.1 Timbre classes

While completing Stage 1, I decided to assign more precise designations of the timbres, by creatings sub-classes of the Points, Lines and Clouds.
The composer asked me to use icons rather than words to designate timbres: “Some electronic music composers have tried to simulate acoustic instruments. Not me.” Nevertheless, I used verbal names for the Point classes as a general orientation while classifying the timbres (by ear and eye). Note that each icon stands for a range of timbres, just as the duration symbols stand for ranges of durations. Timbres should also be editable in the GUI...

4.4.2 Additional symbols

The standard pitch symbols are used to label quarter-tone bandwidths. The symbol for the A above middle C is used to denote any frequency greater or equal to 433.69 Hz and less than 446.40 Hz. Usually, only the main perceived pitch is notated. The frequencies were transcribed using the AudioSculpt "tuning-fork" tool. This performs a sine-wave having the frequency and amplitude of the point at which it is held in the sonogram while displaying the values of these parameters in a separate window.

The symbols for dynamics have been allocated purely subjectively. The equivalent mechanical process would invlove a 2- or 3-dimensional trammel having amplitude, frequency and timbre-class as its parameters.

2.03 etc.

Most of the events in the final score have an identification number equal to the time coordinate of the event in the sonogram rounded to 1/100 Sec. This is purely for reference purposes, so that events can be easily found again in the sonogram. The numbers do not represent anything perceptible. They are like rehearsal numbers.

The curve of a glissando tries to reflect the shape of the curve in the sonogram. This cannot be done exactly however, because the space in the final score is subject to the physical sizes of symbols, and is primarily related to legibility.

Staccato blips. These symbols denote perceptible events nested inside the event notated with a duration symbol. In this piece, such nested events often occur in different spatial locations (channels).

When used in conjunction with a timbre icon, this continuation symbol means that the sound is the same for each event. Successive timbres which sound different even though they have the same timbre class are notated with a separate timbre icon for each event.

An abstract operator symbol. This is used to describe a combination of two Cloud timbres. The operation itself is not further specified. It could be some form of modulation or filtering.

( )

Cautionary Point timbres are used to qualify Cloud timbres.

Addition 2009
The sound of the example below:

Addition 2009
The sound of the example below:

5 Conclusions

Printed music scores contain objects (symbols) which are either the names of classes (durations, pitches, timbres, dynamics etc.) or comments (vertical dotted lines, rehearsal numbers etc.). In the proposed GUI, symbols can be edited in their own window and in a subordinate window where their meaning is defined. The libraries which define the symbols of a notation can be developed into agents capable of performing a score with an arbitrary degree of realism.
Developing agents, shared by particular communities of users, would embody evolving music traditions. Interesting consequences for both scholars and pioneers arise because traditions of performance practice would here, for the first time, be written (software is a form of writing).

References to books

[1] Boulez P.: Penser la musique aujourd'hui (Paris 1963); English translation by Bradshaw, S. and Bennet, R. R. as Boulez on Music Today (London: Faber and Faber, 1971)
[2] Stockhausen K.: ...wie die Zeit vergeht... in Texte zur Musik Band 1 (DuMont, Cologne 1963); Also in Die Reihe #3 - Eng. trans. by Cardew, C. as ...how time passes... (1959)
[3] Stroustrup, B.: The C++ Programming Language, Third Edition. Addison Wesley, 1997. ISBN 0-201-88954-4

Appendices

A footnote about the conference (Wedelmusic2002)

This was a very exciting conference. I would especially like to thank Martin Schmucker, the other local conference organisers and Anne Jacobs at the IEEE for bending a few rules so that I could attend.
The music world is currently in turmoil. Apart from the notation problems (which are coming home to roost because they have been neglected by the musical establishment for at least 30 years), it is proving difficult or impossible to enforce the copyright laws. There were, of course, problems with both notation and copyright before the internet arrived, but these problems have become even more pressing as a result. It seems to me, that - as in the case of the old music notation paradigm - the 18th century copyright paradigm is failing, and that something else is urgently needed to replace it. After listening to the many conference contributions which revolved around the subject, I have begun to think seriously about developing a tenable position.
Delivering a 25 minute presentation at a conference is very different from writing a paper on the same subject. A conference is an event, a written paper is an object. So there were a few differences between what I did and what I have written above:

The diagram in §2.1 was simpler in the presentation. I was talking to people with a broad range of backgrounds, and only had a few minutes to get these ideas across, so had to simplify and use a less than formal language. In a written paper, one can formulate ideas more precisely, leaving it to the readers to take as much time as they like to think about what they have read. Listeners dont have this time, and one has to be careful that they remain attentive, so live presentations have to be more entertaining, less formal.
§3 - The General Remarks were omitted in the formal presentation, because I had no time. I was. however, able to talk about these ideas in the breaks...
I approached §4 - the Sonal Atoms project - by playing the beginning of the piece from a CD while describing the task I had been set, and performing the whole piece (3' 30") at the end of the presentation (while synchronously displaying the 19 pages of the score). This was a great way to end the conference (I was the last speaker). Apart from the talk before mine (which was about the notation of Gregorian Chant), all the audio examples had had tempo (they were mostly popular, commercial music). The conference was trying to find a way to establish standards for music notation, and many of the speakers were still assuming that the 19th century time paradigm could be the basis for this. I hope I showed the conference that such standards are not possible, and at the same time how to solve some of the problems they were facing...

The printed proceedings

Unfortunately, I was not able to complete this paper in time for it to be printed in the official conference proceedings. The following extended abstract appeared instead:

High level symbolic music notations and performance practices are all related to particular cultural traditions. For example, Gregorian Chant, Elizabethan Lute music, 19th century Romantic, and 20th century Avant-Garde musics all relate to different philosophies of space and time. A universal standard for music notation or its performance is therefore neither possible nor desirable.
The web should, on the contrary, allow aural and written traditions to develop freely and independently. Specialists in particular traditions should be allowed to communicate efficiently with their peers, using scores and recordings which reflect those traditions, and which do not distort the subject matter. Living traditions should be allowed to develop.
This paper approaches the problem by isolating and describing concepts shared by all music notations, using the architecture of a proposed, general purpose music editor as a framework. The editor uses nested levels of freely definable symbols and interchangeable software libraries to encapsulate information about each individual notation and performance tradition.
Music notations have many similarities to computer programming languages. Editing music is the programming of events. It is the responsibility of the user to know or define how the symbols behave in space and what they mean in time. Terminology and conceptual advances made in the development of computer languages over the past decades are therefore relevant here.
The proposed libraries use functions to define default values for the behaviours of the symbols in space and in time. The actual spatial and temporal values of instantiated symbols in scores can automatically take local contexts into account and/or be interactively changed by users. The libraries can therefore become arbitrarily complex while remaining user-friendly (user-transparent). There is no obstacle to them becoming agents, capable of learning to transcribe and/or perform in a particular style with an arbitrary degree of realism.
Developing agents, shared by particular communities of users, would embody evolving music traditions. Interesting consequences for study and development arise because traditions of performance practice would here, for the first time, be written (software is a form of writing).
The approach implies that written music distributed over the web should either contain the information for a particular performance, or that the recipients have learned the notation and its performance practice, or that they are in possesion of an agent which has done this.
The effort required to create such an environment should be compared to its potential benefits. The encapsulation of different notation and performance traditions in software libraries has very large implications for publishers and the commercial music industry (not to mention the obvious advantages to music scholars and pioneers), so the advantages of developing in this direction may be considerable. Even small libraries, containing a few simple symbols, could be very useful in some circumstances - for example in recording studio software.