Additions 2009 Sonal Atoms and the complete transcription
described in the second part of this paper,
are now available at this website here.
I have also added audio examples,
where appropriate, below.
There are many living traditions of written music, each of which consists of evolving,
high-level symbolic notation and performance practices. A finite, universal standard
for music notation and its performance is therefore neither possible nor desirable.
The web should, on the contrary, allow written and aural traditions to develop freely
The aim of this paper is to describe some concepts which would allow different musical
traditions to evolve in parallel, and to suggest some orientation for programmers
wanting to write flexible tools for a developing culture.
After a brief introduction, an analysis of the top level mechanisms by which written
and aural traditions evolve is presented. This high level description is subsequently
compared to a proposal for a general architecture for music editing software. This
architecture, and the relation of its symbol libraries to agent technologies, is
covered in greater detail in
The Writing of Style Libraries for Music Notation and Performance.
A concrete example is then provided, showing how the author began to develop one
such library of symbols while transcribing a piece of electronic music (Sonal Atoms
by Curtis Roads).
space, time, space-time, context, symbol, meaning, level, agent
1 Introduction: The cultural context
There is currently a conceptual gulf between symbol-based music notation editors
such as Finale and
Sibelius, and programs such as Pro Tools
or IRCAM’s AudioSculpt which use space-time representations. The former are
used to write music for human performers, the latter to control synthesizers and
Standard music notation is in wide use, but its relation to real events is difficult
to program or to control. Space-time representations are relatively easy to implement,
but it is difficult to encapsulate logical concepts or add additional information
to such diagrams, so they do not encourage the kind of high-level structural thinking
characteristic of a developing musical tradition.
Both these forms of representation have their roots in the dualistic western 19th
century time paradigm (real time = clock time + expressivity). This paradigm has
strong ties with western philosophy and the western music tradition of the past
300 years or so, but it is not appropriate for describing anything else. Its collapse
at the start of the 20th century marked the end of a long period of development,
and produced a crisis in written western music which has still not been overcome.
Classic accounts of the mid 20th century composers' time-notation paradigm can be
found in Boulez P.: Penser la musique aujourd'hui
, and Stockhausen K.: ...wie
die Zeit vergeht... . My own view of the historical context is
explained in more detail in The
Notation of Time (1985), and
Inherited Problems and a Proposed Solution (2002).
In spite of their considerable power, there are many kinds of music which the standard
music notation editors can only accomodate with difficulty (for example: Gregorian
Chant, Elizabethan Lute music, 16th century Italian Madrigals, 20th century Avant-Garde
graphics, unknown 21st century developments, Japanese, Indian and other non-western
musics). Not only do working procedures and file formats become unnecessarily complicated
in such cases, but everything has to be conceived as an ad hoc change to
some suspect 19th century concept. There is no way to embody a special tradition
of notation and performance practice in a piece of software (a library) which could
be used to develop those concepts directly.
2 A proposed software architecture
2.1 The top level components
The above diagram shows the general way in which I think written musics have developed.
There are three main areas: time, space-time and space. The time area contains events.
(Events contain information which is associated with a unique moment in time.) The
space area contains objects. (Objects contain information which is independent of
time. The information stored in an object can be copied exactly - it is independent
of any physical decay of the medium.) The space-time area contains entities which
involve both space and time.
The performer/writer, who is at the top middle of the space-time area, can create
events (in the time area) using any hard- and software which may be available. This
could be anything from a single musical instrument to a large modern studio having
many synthesisers, samplers, microphones, performers, computers etc. Events in time
are a "performance", which is an instantiation of general rules within a developing
The performer/writer can also create objects (in the space area) such as a musical
score or a physical recording of a performance. As with events (temporal instantiations),
such objects contain instantiations of evolving symbol definitions at a particular
point in time. The intantiations have unique local values which are stored in the
Written scores contain unique spatial values for the positions of symbols, but the
temporal meanings of those symbols are only defined generally (i.e. non-uniquely)
in a tradition of performance practice stored in the minds of those who have learned
it. The information contained in physical recordings can, however, be turned directly
into temporal events using a piece of standardised machinery. Recordings are cultural
reference points, independent of the passage of time.
The performer/writer (the developer of a style) is part of a double feedback loop.
A personal stylistic evolution occurs as unique circumstances are explored by trial
and error. Those aspects of events which are thought to be important are fixed in
space by instantiating some suggestive glyph from an evolving set of conventions.
Alternatively, going the other way round the diagram, the score can be edited (object
creation, writing) - whereby default spatial values are taken from the evolving
personal notation practice - and performed using the corresponding performance practice
so that the events can be heard (listening, event perception). This double feedback
loop was I think responsible for the development of writing in general, and western
written music in particular. This extraordinary cultural achievement was the result
of harnessing deep connections between ears (time) and eyes (space). The collapse
of the 19th century notation paradigm, and the associated failure to cultivate these
connections, are I think the main reasons for the current lack of development in
BBeneath the performer/writer is a block containing the evolving, externalised tradition
of performance and notation practice. The conventions used for relating events to
objects are developed by individuals, but taught to others so as to create a tradition.
The process continues by recursion, beyond the lifetimes of individuals, as students
2.2 Developing written music using computers
2.2.1 The score/recording
This consists of a GUI having a set of nested windows. Traditionally, of course,
scores have been written on two-dimensional paper, and "performance practice" has
been developed and stored in the minds of composers and performers. But we are dealing
here with music represented on computer screens, and the GUI can consist of nested,
editable windows. This simplifies the process of defining the symbols considerably.
One can start with very basic symbols, and add new ones as required.
The upper windows contain high-level symbolic information about the events in a
particular piece of music. The symbols are the names of the windows in the levels
below. The highest analog, machine level (A1) contains the temporal information
necessary for a particular performance. Lower machine levels may exist, for example
to change the sensitivity of some device.
GGiven a tradition of performance practice, the symbolic levels of the GUI can be
read and converted into events without the use of mechanical aids (i.e. ignoring
the information in the levels below the threshold of perception). It is therefore
quite feasible to design such a symbolic level window to be printed out and performed
from a paper copy - for example in the notation style of Mozart. The "Mozart" notation
style would not need to include complex, 20th century symbols. There is a finite
set of symbols in Mozart's notation style.
TThe score distributed over the Web might be a traditional, two-dimensional, one-level
diagram or the complete, inter-connected, multi-level set of diagrams complete with
sufficient information to allow a particular performance to be reconstructed.
If the user is in posession of an agent which can perform the symbolic levels (the
symbols and the agent must use the same notation conventions) then the agent can
learn from the new performance information and/or ignore it in order to perform
the score ”creatively“ on its own.
2.2.2 The agent (libraries)
For each of the window levels in the score, there are two corresponding, interdependent
libraries defining the general characteristics of each symbol defined at that level.
One of these defines the symbols' spatial behaviour (the notation practice), the
other defines the contents of the subordinate window for that symbol. Temporal behaviour
(performance practice) is stored as the relationship between windows S1 and A1.
WWhen a user adds a symbol to S1, the software looks at the library definition of
the symbol to see how it should behave on screen (e.g. centering a staccato dot
above a notehead) and what effect that has on the performance (the staccato dot
might have a default effect on the envelope curve). In such a case, the default
meaning of the staccato dot need not be accepted. The user can edit the precise
meaning at that point in the score by editing the envelope in A1 . The local meaning
of a symbol can change according to context. The spatial behaviour of staccato dots
has remained practically identical for many composers over the past 400 years, but
even the general, default meanings of those dots changes considerably.
Feedback is possible between the scores and libraries. This can happen while editing
different scores which use the same conventions and performing each of these in
different ways. Such feedback can, over a period of time, be utilised to turn the
libraries into performing agents. Editing the temporal characteristics of particular
instances of symbols, taking local context into account, can gradually teach the
libraries how to play a score convincingly in a particular style.
2.3 Event analysis (‘chunking’)
Chunking is the process whereby each event in a series of events (represented in
a space-time diagram) is given a separate symbol (or name), and the event's name
is connected to the space-time information at a lower level.
The top part of the following diagram is a typical space-time representation of
a series of events. As far as a machine is concerned, this is a single, undifferentiated
curve. People however instinctively break such curves into manageable chunks. Such
chunks can be labeled just by putting a dot on each peak (the dot can be oval, like
2.3.1 The transcription of durations
The lengths of the ‘events’ can be classified, irrespective of the existence
of a tempo, using a logarithmically constructed trammel. Using the classic duration
symbols means that legibility can be improved later (horizontal spatial compression,
use of beams).
It would be useful for the standard notation of tempoed music to be a special case
here: A histogram can always be constructed from the lengths of the events, so if
the diagram represented a piece of classical music with durations having proportions
ca. 2:1, then it would be very easy to construct a trammel to produce the original
notation. If there are no such proportions in the original diagram, the user might
relate the trammel to the shortest length, or try to ensure maximum differentiation
in the resulting transcription.
2.3.2 Symbol overloading
All symbols have freely definable meanings in the proposed libraries. Conventional
meanings can be stored in standard libraries, but this does not preclude unconventional
uses and/or symbol overloading. Here, for example, the ‘pitch’ symbols
are being used as slider knobs to control dynamic and filtering.
3 General remarks
3.1 Music notation as a programming
Adding a staccato dot to one of the symbolic levels of the GUI would change the
envelope stored in the related notehead. This is an example of a general rule: When
a user changes or edits an object in one of the symbolic level windows in the GUI,
the software has to take the local context of that change into account in order
to generate default temporal values for the known parameters of all the related
symbols in the vicinity.
The precise value of each individual parameter in each individual symbol is unique,
and can relate both actively and passively to the symbol's local context in ways
defined in the library. Intuitively understanding the "correctness" of the default
value which the software provides, is to recognise the style defined in the library.
Notice that because the default values of particular instantiations of symbols are
discovered by performing calls to functions, there are no limits to the complexity
of the spatial or temporal style.
IIt is even possible to think of music notation as a two dimensional, event-oriented
programming language: In standard music notation, chords such as
are constructed, using four lists of symbols attached to a central core:
The core contains noteheads and the duration class symbol. listL and listR
contain as many sub-lists as there are noteheads in the core.
Such a chord is an event constructor in the 2-dimensional language. In 1-dimensional
languages such as C++ , the constructor
would be written as a sequence of characters something like this:
event(core, listT, listL, listB, listR);
The logical x-position (horizontal position) of a 2-dimensional chord is independent
of the shapes and sizes of the symbols in its argument lists, and there are well-known
(if complex) spatial rules for maximizing legibility when combining chords both
in sequence and in parallel on a page.
High-level, 2-dimensional languages like music notation could be developed for use
in multiprocessing situations. Orchestra scores aree the prototype for coordinated
3.2 The development of temporal styles
There are two levels at which an editor's symbol libraries can be defined:
Level 1: Defining a completely new symbol (a name ) together with its subordinate
window (its definition or set of control structures). Non-expert users could
skip this level of the software. Some composes are however already using environments
such as IRCAM's OpenMusic which
allow users to drag-and-drop abstract control structures to create custom control
Level 2: Setting the preferences for the default values
of a given set of controls. This could, for example, be done by demonstrating
the values in real time - a method which would be especially important if the user
cannot understand or directly edit the definitions in the library.
Default values for each symbol's known control set could also be related to a statistical
analysis of many instances of that symbol in one or more performances of a given
score. The default value could, for example, easily be related to a Gaussian distribution
of already existing values. A multi-level score of a piece of classical music could
be fed with unlimited numbers of recordings in order to seed its libraries with
a classical performance style.
Some libraries might learn a user's preferences by continuously observing how that
user sets or changes particular values in the score, or allow users to edit Gaussian
3.3 Relations to Artificial Intelligence
This subject is treated more fully in
The Writing of Style Libraries for Music Notation and Performance, but it
is worth noting that the concepts described here interact with many music- and time-related
AI research into performance practice and expressivity in particular kinds of music
may well suggest algorithms which could be used to program the temporal meanings
of symbols. This in turn would create written traditions of performance practice
(software is a form of writing) which would especially benefit lesser known performance
traditions and New Music. Individuals would be able to learn a new style realistically
in their own time, making expensive group rehearsals more productive.
AI algorithms could also be used to develop expressive speech for automata.
4 Transcribing Sonal Atoms by Curtis
This project is described here as an example, showing how one particular library
of symbols began to develop.
4.1 The project's background
After reading my essay The Notation
of Time (which is on my website, but which is no longer available in print),
Curtis Roads invited me to give a lecture at the CREATE Institute at UCSB during
February 2002. This gave me a chance to provide an updated version of my thoughts
on the background to the crisis in 20th century music notation (Inherited
Problems and a Proposed Solution). As a follow-on project, he commissioned
me to create a symbolic score of his piece Sonal Atoms which is purely
electronic music, made using particle synthesis methods.
The source materials he sent me included an audio CD of the piece, some background
information about particle synthesis, and some printouts of AudioSculpt screenshots
of the piece.
AudioSculpt provides several tools which make transcribing easier (replay of user-defined
sections, zoom functions for the visual display, a tuning-fork which plays a sine-wave
with the amplitude and frequency of any point in the sonogram).
While we corresponded about certain details, he kindly allowed me to have the last
word in any decisions. I was very happy to accomodate him if this did not conflict
with the project objectives outlined below. We were working to a finite budget,
so I had to ignore certain aspects of the composition (in particular the spatial
distribution of the sounds) and restrict the amount of information I could include
about the pitches. (I usually just notated the main perceived pitch in an event.)
The project objectives can be summarised as follows:
To create a symbolic representation of the piece.
The transcription should be 2-dimensional (printable
on sheets of paper), but conceptually a printout of one of the windows of the GUI
To proceed as if training an agent capable of making
automatic transcriptions of similar pieces - all steps should therefore, in principle,
To investigate unforseen problems and isolate further generalisations.
4.3 Stage 1: Chunking and the basic event classes
The first stage of transcription was to decide which events could be named.
I therefore started by aligning noteheads (with their duration class) directly under
the sonogram while listening to the piece and looking at the sonogram. I used a
trammel having the following duration class values (space-seconds in the sonogram):
After annotating a few pages, it became clear that the events could be subclassed
into Points, Lines and Clouds. (The composer
had already mentioned these categories - Sonal Atoms, is one of a
cycle of pieces called Point, Line, Cloud. ) I decided that:
All events which last less than 0.1 sec are Points.
Lines have a perceptible pitch, and appear in the sonogram as horizontal
or diagonal lines.
Clouds have no pitch.
The sound of the following example:
Stage 1: naming events
4.4 Stage 2: The final score
Before transferring this basic symbolic information to the final score - where spacing
is used not to denote timing, but to improve legibility - various other information
4.4.1 Timbre classes
While completing Stage 1, I decided to assign more precise designations of the timbres,
by creatings sub-classes of the Points, Lines and
The composer asked me to use icons rather than words to designate timbres: “Some
electronic music composers have tried to simulate acoustic instruments. Not me.”
Nevertheless, I used verbal names for the Point classes as a general orientation
while classifying the timbres (by ear and eye). Note that each icon stands for a
range of timbres, just as the duration symbols stand for ranges of
durations. Timbres should also be editable in the GUI...
4.4.2 Additional symbols
The standard pitch symbols are used to label quarter-tone bandwidths. The
symbol for the A above middle C is used to denote any frequency greater or equal
to 433.69 Hz and less than 446.40 Hz. Usually, only the main perceived pitch is
notated. The frequencies were transcribed using the AudioSculpt "tuning-fork" tool.
This performs a sine-wave having the frequency and amplitude of the point at which
it is held in the sonogram while displaying the values of these parameters in a
The symbols for dynamics have been allocated purely subjectively. The equivalent
mechanical process would invlove a 2- or 3-dimensional trammel having amplitude,
frequency and timbre-class as its parameters.
Most of the events in the final score have an identification number equal to the
time coordinate of the event in the sonogram rounded to 1/100 Sec. This is purely
for reference purposes, so that events can be easily found again in the sonogram.
The numbers do not represent anything perceptible. They are like rehearsal numbers.
The curve of a glissando tries to reflect the shape of the curve in the sonogram.
This cannot be done exactly however, because the space in the final score is subject
to the physical sizes of symbols, and is primarily related to legibility.
Staccato blips. These symbols denote perceptible events nested inside the event
notated with a duration symbol. In this piece, such nested events often occur in
different spatial locations (channels).
When used in conjunction with a timbre icon, this continuation symbol means that
the sound is the same for each event. Successive timbres which sound different even
though they have the same timbre class are notated with a separate timbre icon for
An abstract operator symbol. This is used to describe a combination of two Cloud
timbres. The operation itself is not further specified. It could be some form of
modulation or filtering.
Cautionary Point timbres are used to qualify Cloud timbres.
Printed music scores contain objects (symbols) which are either the names of classes
(durations, pitches, timbres, dynamics etc.) or comments (vertical dotted lines,
rehearsal numbers etc.). In the proposed GUI, symbols can be edited in their own
window and in a subordinate window where their meaning is defined. The libraries
which define the symbols of a notation can be developed into agents capable
of performing a score with an arbitrary degree of realism.
Developing agents, shared by particular communities of users, would embody evolving
music traditions. Interesting consequences for both scholars and pioneers arise
because traditions of performance practice would here, for the first time, be written
(software is a form of writing).
References to books
 Boulez P.: Penser la musique aujourd'hui (Paris 1963);
English translation by Bradshaw, S. and Bennet, R. R. as Boulez on Music Today (London:
Faber and Faber, 1971)  Stockhausen K.: ...wie die Zeit vergeht... in
Texte zur Musik Band 1 (DuMont, Cologne 1963); Also in Die Reihe #3 - Eng. trans.
by Cardew, C. as ...how time passes... (1959)  Stroustrup, B.: The C++ Programming Language,
Third Edition. Addison Wesley, 1997. ISBN 0-201-88954-4
This was a very exciting conference. I would especially like to thank Martin
Schmucker, the other local conference organisers and Anne Jacobs at the IEEE for
bending a few rules so that I could attend.
The music world is currently in turmoil. Apart from the notation problems (which
are coming home to roost because they have been neglected by the musical establishment
for at least 30 years), it is proving difficult or impossible to enforce the copyright
laws. There were, of course, problems with both notation and copyright before the
internet arrived, but these problems have become even more pressing as a result.
It seems to me, that - as in the case of the old music notation paradigm - the 18th
century copyright paradigm is failing, and that something else is urgently needed
to replace it. After listening to the many conference contributions which revolved
around the subject, I have begun to think seriously about developing a tenable position.
Delivering a 25 minute presentation at a conference is very different from writing
a paper on the same subject. A conference is an event, a written paper is
an object. So there were a few differences between what I did and what I
have written above:
The diagram in §2.1 was simpler in the presentation. I was talking to people
with a broad range of backgrounds, and only had a few minutes to get these ideas
across, so had to simplify and use a less than formal language. In a written paper,
one can formulate ideas more precisely, leaving it to the readers to take as much
time as they like to think about what they have read. Listeners dont have
this time, and one has to be careful that they remain attentive, so live presentations
have to be more entertaining, less formal.
§3 - The General Remarks were omitted in the formal presentation, because I
had no time. I was. however, able to talk about these ideas in the breaks...
I approached §4 - the Sonal Atoms project - by playing the beginning
of the piece from a CD while describing the task I had been set, and performing
the whole piece (3' 30") at the end of the presentation (while synchronously displaying
the 19 pages of the score). This was a great way to end the conference (I was the
last speaker). Apart from the talk before mine (which was about the notation of
Gregorian Chant), all the audio examples had had tempo (they were mostly popular,
commercial music). The conference was trying to find a way to establish standards
for music notation, and many of the speakers were still assuming that the 19th century
time paradigm could be the basis for this. I hope I showed the conference that such
standards are not possible, and at the same time how to solve some of the problems
they were facing...
The printed proceedings
Unfortunately, I was not able to complete this paper in time for it to be printed
in the official conference proceedings. The following extended abstract appeared
High level symbolic music notations and performance practices are all related to
particular cultural traditions. For example, Gregorian Chant, Elizabethan Lute music,
19th century Romantic, and 20th century Avant-Garde musics all relate to different
philosophies of space and time. A universal standard for music notation or its performance
is therefore neither possible nor desirable.
The web should, on the contrary, allow aural and written traditions to develop freely
and independently. Specialists in particular traditions should be allowed to communicate
efficiently with their peers, using scores and recordings which reflect those traditions,
and which do not distort the subject matter. Living traditions should be allowed
This paper approaches the problem by isolating and describing concepts shared by
all music notations, using the architecture of a proposed, general purpose music
editor as a framework. The editor uses nested levels of freely definable symbols
and interchangeable software libraries to encapsulate information about each individual
notation and performance tradition.
Music notations have many similarities to computer programming languages. Editing
music is the programming of events. It is the responsibility of the user to know
or define how the symbols behave in space and what they mean in time. Terminology
and conceptual advances made in the development of computer languages over the past
decades are therefore relevant here.
The proposed libraries use functions to define default values for the behaviours
of the symbols in space and in time. The actual spatial and temporal values of instantiated
symbols in scores can automatically take local contexts into account and/or be interactively
changed by users. The libraries can therefore become arbitrarily complex while remaining
user-friendly (user-transparent). There is no obstacle to them becoming agents,
capable of learning to transcribe and/or perform in a particular style with an arbitrary
degree of realism.
Developing agents, shared by particular communities of users, would embody evolving
music traditions. Interesting consequences for study and development arise because
traditions of performance practice would here, for the first time, be written (software
is a form of writing).
The approach implies that written music distributed over the web should either contain
the information for a particular performance, or that the recipients have learned
the notation and its performance practice, or that they are in possesion of an agent
which has done this.
The effort required to create such an environment should be compared to its potential
benefits. The encapsulation of different notation and performance traditions in
software libraries has very large implications for publishers and the commercial
music industry (not to mention the obvious advantages to music scholars and pioneers),
so the advantages of developing in this direction may be considerable. Even small
libraries, containing a few simple symbols, could be very useful in some circumstances
- for example in recording studio software.