Music Notation and Agents as Performers

History

This paper was written for the Cast01 symposium at the GMD (now Fraunhofer Institute) in St Augustin, Germany, in the spring of 2001, but it was rejected without detailed comment. Probably this was partly because the Cast01 symposium had a bias towards graphic arts rather than towards music. They were not interested in music notation.
It was later submitted to the Second International Conference on Music and Artificial Intelligence, ICMAI02, in Edinburgh, Scotland, where it was reviewed and accepted as a category B paper (a poster presentation).
The 2002 revision - The Writing of Style Libraries for Music Notation and Performance - can be found here (with a few additional comments written after the conference), the poster (PDF) is here.
I am leaving the original 2001 version of Music Notation and Agents as Performers here at this website because I want to keep track of the history of these ideas. The 2002 revision may be clearer in some respects, but it avoids using the word “agent” so as to circumvent an unproductive semantic debate. However, I still think that the libraries I am proposing can develop into increasingly “intelligent” composers’ assistants and performers.

ji. www 17th September 2002

Keywords

Introduction

1.1 The importance of writing
1.2 Conceptual errors in music notation
1.3 The development of grammars
1.4 Music notation as a programming language
1.5 External implications

Names, meanings, levels

2.1 Events and chunking
2.2 The smallest symbols and their meaning

The proposed software

3.1 Windows and data encapsulation
3.2 Parameters, local values and meaning
3.3 Libraries
3.4 Defining graphic symbols
3.5 Local values, meaning and training agents

Conclusion

References

Diagrams

Figure 1: Global architecture of a proposed music editor / sequencer
Figure 2: Event analysis (“chunking”)

Remarks (November 2001)

Abstract

This paper develops my original proposal for a new kind of music authoring software [1], and shows how the application of concept encapsulation techniques to music notation leads to a promising approach to the development of agent technologies.
The proposed software strictly separates the graphic characteristics of symbols (in space) from their meanings (in time), and stores these in separate, interconnected, author-definable libraries.
Account is taken of both user-defined (local) and machine-defined (default) meanings. Composers either know the context criteria affecting the local meanings of their symbols, or can demonstrate variations of meaning in real time (as often as required for a statistical analysis of the context criteria).
A library containing context dependent meanings is a composer's assistant or agent, capable of performing scores intelligently in real time.
The creation of new, elementary graphic symbols and their integration into symbol libraries is also described.

Keywords

context, space, time, symbol, meaning, agent

Introduction

1.1 The importance of writing

Writing is, in many areas of culture, a precondition for the development of complex ideas and high level grammars. In western classical music for example, the rules of harmony and counterpoint, and subtle traditions of performance practice, could only have developed because that tradition is a written one. Written symbols are strictly speaking timeless, existing only in space (the deterioration of the physical medium over time is irrelevant, because the essential characteristics of the symbols can be copied exactly). Written cultures use timeless symbols as the substrate in which they develop. Development is a function of time, and needs a frame of reference.

1.2 Conceptual errors in music notation

Notational conventions can, however, become obstacles to progress. Written music is currently in a period of stagnation because the standard symbolic notation conventions embody conceptual errors about the relation of time to space, and because these conventions continue to provide the conceptual framework within which composers and performers think [2]. The errors stem from the 19th century, and their prime causes are

the view that time is equivalent to a dimension of space, and the consequent confusion as to whether the notation is symbolic or analog
the assumption that the symbols can be given a fixed, absolute meaning
the ubiquity of tempo in 19th century music

1.3 The development of grammars

In recent years, it has become very clear from the way computer programming languages have developed, that the key to allowing authors to develop their grammars is to provide them with software which allows them to create new symbols and to redefine the meanings of existing ones. Authors should be allowed to define and encapsulate their concepts in any way they like, and to build on preexisting ideas, so as not to have to start from scratch all the time. In computing, this is currently done by providing programmers with interfaces, either to libraries or to complete, working applications (APIs).

1.4 Music notation as a programming language

The symbols in standard music notation are the names of concepts which have been learned by composers and performers (time is not just equivalent to a dimension of space), so music notation can be thought of as an authoring or programming language whose classic interpreters are people.
The lowest level music symbols are as small as possible (single characters or simple lines). These symbols are combined two dimensionally to create larger, more complex symbols and a maximum density of legible information in the two dimensions of the page. This is important when having to read in real time, extract meanings at the highest possible level, and turn as few pages as possible while doing so.

1.5 External implications

Current computer programming languages are based on alphanumeric text, and the symbols (the names of objects or functions) are generally word-sized. Interestingly, such text is usually formatted in two dimensional space so as to increase (human) legibility. Contrast this situation with that of ordinary text, where a single string of words and punctuation is simply folded onto the page.
It may be possible to create specialised computer programming languages, for use outside music, in which an increased density of information is achieved because they use character-symbols arranged two-dimensionally instead of simple word sequences. The compiler (parser, interpreter, performer) would have to be more complicated, but the script could be smaller (faster to transmit). Note that symbols arranged in three or more dimensions have a still higher density of information...
Because this paper describes an application in music, useful for composers and sound engineers, I use events to exemplify the meanings of the symbols. But events are also used outside music, and other meanings are of course possible, so a similar approach could be used in other disciplines.
I think that local context is a concept which lies outside space and time, and that its active application to the use of symbols, in all areas of human cognition (philosophy, literature, painting, architecture, music, computing, mathematics, the sciences etc.), would have far reaching consequences.

Names, meanings, levels

2.1 Events and chunking

Perception is intrinsically chunked ¹ (see, for example, [3]). We perceive whole objects and events, not the raw physical data into which these can be analysed by using secondary instruments. Pitch is experienced, but frequency (e.g. 440 Hz) is not. We can say what pitch a note has, and how that pitch relates to other pitches, but we cannot count the vibrations. Pitch is, in this sense, elementary in music.²
Events are chunks of otherwise amorphous temporal experience. But they are not necessarily elementary. Many events can combine to create a single, higher level event.
Music notation is concerned with perceived events, and the lowest level graphic symbol it uses to represent one is the chord. The simplest chord symbol consists of a single dot, and more complex chord symbols can be created by clustering elementary symbols (in a local context) to create complex, word-sized objects - which people can read as single objects. Such objects can themselves be clustered (creating a higher level local context) to make compound symbols at a still higher level. Standard music notation contains many types of connector, such as stems, beams, slurs, barlines etc. which aid legibility by physically (visually) binding such high level symbols together.

2.2 The smallest symbols and their meaning

The smallest perceivable, two dimensional symbols are characters, and the simplest of these is the dot. Dots are used extensively in music notation and text. Their meaning changes according to the local, graphic context. They can be used, for example, as

noteheads (which combine to form chords),
staccato indications (above or below chords),
duration augmentation (to the right of chords),

and in text as

parts of other characters (i, j, ä, ö, ü, :, ;),
punctuation (.),
bullets (•).

Such a dot (a notehead) may represent a very complex event. For example, in organ or synthesizer music it may represent an event which has several, programmable pitches. Depending on the instrument, there may be an intrinsically associated dynamic envelope, a maximum possible duration etc. The dot symbolises, or is the name for a complex of information at a lower level. The dot means the settings at that lower level.

The proposed software

3.1 Windows and data encapsulation

Figure 1 describes a further development of the music editing/sequencing software which I originally proposed in [1]. The new proposal incorporates windowing techniques to navigate levels of the kind described above, at any degree of granularity. Encapsulation ensures that the inner details of events do not get in the way while one is trying to concentrate on the grammars special to the higher levels.
Data encapsulation (using linked windows) is a concept very familiar to computer programmers (and designers of web pages - also two-dimensional collections of symbols...), but it has not yet been used successfully to incorporate nested levels of symbolic music notation into music editing software (see for example [4, 5, 6, 7]). Especially in the domain of time, the assumption that the symbols can be given absolute meanings (like the symbols of arithmetic) has made it difficult to recognise that they are fundamentally different from analog, space-time notations.
Music notation symbols are more like words in ordinary language. Their meanings should be locally and globally user-definable, and users should be able to develop what they mean by storing those meanings in libraries.
It is not difficult to see that current standard music notation does its best, in two dimensions, to straddle more than one of the symbolic levels shown in Figure 1. (Chords, which are spelled out with several noteheads can be collapsed to use just one; some note patterns can be more succinctly expressed using ornament signs; Roman numerals can be used to represent chord functions; etc., and such strategies are often mixed on the same piece of paper.)
It is to be expected that low-level symbol clustering will continue to function in the same way on both computer windows and on paper, because both are two dimensional, but the introduction of accessible, nested windows ought to enable the notation to develop in ways which were otherwise unthinkable (because uncontrollable).
If this software is being used in conjunction with a synthesizer, the analog controls will contain patch information, so deeper levels of those controls are also possible (A2, A3 etc.) - for example to adjust the sensitivity of one of the controls in A1.

3.2 Parameters, local values and meaning

Obviously, a symbol's (e.g. notehead's) meaning is relative to the symbolic level window (SLW) (S1, S2 etc.) which contains it.
Within an SLW all noteheads have the same parameters. Issuing an edit command for each notehead in S1 will open a window containing the same set of controls, but with different values for those controls. Noteheads in other SLWs have different meanings, defined by different sets of controls (devices).

3.3 Libraries

Users must therefore first decide which graphic symbols can occur in each SLW, and then associate a set of controls (parameters) to each symbol inside each SLW. Both these procedures can be simplified by using object libraries.
Libraries of graphic symbols contain both their shapes, and functions describing the way they move about in space (independent of their meanings). See §3.4.
Libraries containing controls (devices) must contain not only the visual information which allows them to be displayed on the screen and to function correctly, but also methods which use the local context to determine their default values (the values they take on without user intervention). More about this in §3.5.
Drag-and-drop is a well established method for adding analog control devices to windows (see e.g. [8]), and a similar approach could be adopted to allow users to define the control panels associated with high level symbols (e.g. the symbols in S2).
As at the lowest level S1, it is possible to predefine a set of abstract controls with which many S2 symbols (e.g. the trill) can be defined (interval width, number of notes, speed etc.), and to allow such controls to be dragged and dropped by users to define new symbol control sets. Beginners, and many other users, would not have to deal with this level of the software, because powerful, standard symbol control definitions would already be available in libraries.

3.4 Defining graphic symbols

Any graphic symbol must have a defined shape, and a way of moving about in space defined with respect to its local context.
There are, in music notation, a small number of symbol types, which define the behaviour of the symbols in space without saying anything about their shapes. New spatial behaviours are very rare.
For example, chord types might include OrdinaryChord, HeadlessChord, StemlessChord, Rest.
ChordComponents (element types) include: Notehead, Stem, Flag (central), Accidental (to the left), Auxiliary (above and below), AugmentationDot (to the right).
Auxiliaries include Accent, StaccatoDot, Trill, Turn, RomanNumeral, SungText, ChordDynamic etc.
Notice that it is never necessary to store the shape of a chord in a library, because they are constructed ad hoc in scores, by clustering ChordComponents.
To be instantiatable, elementary symbols must have a defined shape, and these shapes must form part of the symbols stored in the graphic library. Such shapes can be loaded from a font, and will not usually have to be created from scratch. Music notation reuses the simplest symbols as far as possible, and has no need for very complex elements (in this respect it is unlike ordinary text, whose lowest level meaningful symbols are words - of which there are an unlimited number).
So, new symbols (e.g. for accents or noteheads), can be created by subclassing an existing type to inherit a spatial behaviour, and loading the shape from a font.
Class libraries defining the spatial behaviour of music symbols exist in my (unpublished) Music Toolbox software [9]. Almost certainly, they also form part of the software underlying the standard music notation programs [10, 11].

3.5 Local values, meaning and training agents

When a user adds a symbol to a window, wanting the software to be able to perform it correctly, the software has to take the symbol's context into account in order to generate the default values for that symbol's known parameters.
The precise value of each individual parameter in each individual symbol is unique, and relates both actively and passively to the symbol's context.
An analysis of the way each symbol relates to its context is a necessary part of the process of discovering what the default values of its parameters should be.
The symbol's context may include not only the local vicinity, but also the meaning of the same symbol in other contexts.
If the user is unhappy with the default values provided by the software, those values can either be edited in the window provided, or demonstrated by performing them (perhaps several times) directly on some input device such as a keyboard. Either way, the values can then be used in the evaluation of future default values.
The system ought to be able to learn the user's style, possibly by doing an analysis of several performances of the same notation. It should be possible to abstract a given performer's preferences, and to store them in a library which could perform the notation independently.
In addition to demonstrating (which probably requires a complex statistical analysis to produce interesting results), composers or performers could also, in principle, directly define the criteria by which a context affects the meaning of a symbol. They usually know why they perform symbols in a particular way in a particular situation. Many such insights are however at a high structural level, and are currently difficult to formalise. Research needs to be done in this area, but it is possible that the problem of high level, long-range criteria may be solvable using special symbols in high level SLWs.
Many low level criteria can, however, already be fairly easily described (e.g. in many styles, the final note of a slurred phrase has a particular dynamic envelope - which can be made a function of the slur symbol).
Notice that the default values provided by the software must be exact, but that the rules with which the context is evaluated may only be able to provide a range of values within which the selected values have to occur. It is even possible to think of the performed context as being 'inexact' because human memory decays over time. The problems can become as complicated as one likes. A good strategy is, however, to solve the easiest problems first, and then to see whether the more difficult ones become easier or go away altogether.

Conclusion

Music (as opposed, for example, to human language), is concerned with a relatively small number of clearly definable symbol definitions, so it is likely to be a productive test-bed for context analysis techniques. Such techniques are, I believe, essential for the development of convincing agents.

References

[1] James Ingram, 2000 Perspective, Space and Time in Music Notation
[2] James Ingram, 1985 The Notation of Time
[3] Nelson Goodman, 1976 The Languages of Art (Hackett Publishing Company Inc.)
[4] IRCAM Music Representations group
[5] CREATE Research White Papers (link expired)
[6] Acousmographe & GRM Tools (link expired)
[7] CCRMA SynthBuilder
[8] SONIC CORE GmbH: Scope/SP (old link)
[9] James Ingram: Music Toolbox
[10] Coda Music Technology: Finale
[11] Sibelius Software Ltd. , England.: Sibelius

Diagrams

Figure 1: Global architecture of a proposed music editor / sequencer

This diagram represents the state of my ideas in May 2001. These ideas are currently evolving fairly quickly, so more recent versions of this diagram will be found in more recent papers. The first version can be found in Perspective, Space and Time in Music Notation.

Figure 2: Event analysis (“chunking”)

An irregular space-time diagram can be resolved into a series of named chunks as follows [1, 2]:
1. Construct the trammel. Ideally, it is constructed for maximum differentiation in the transcription. (Logarithmically constructed trammels result in conventional legibility.) There are ways to define "maximum differentiation" so that optimal trammels can be calculated by the software, on the basis of an analysis of the space-time diagram. Users should, as always, be given the chance to override such suggestions.

2. Label each impulse (event) with the symbol corresponding to its length in space (in this case the distance to the next event). Beaming (higher level chunking) has been used to further increase legibility in this example.

3. The dynamic envelope curve and other time related information (e.g. duration in seconds, pitch, patch etc.) can be stored in a separate data object associated with (named by) each symbol, so the symbols themselves can subsequently be spaced for maximum legibility, independently of that information (as in standard notation). The data associated with each symbol is editable in the window which opens when the user issues an edit command for that symbol.

Remarks (November 2001)

1: chunking and quantum concepts are related somehow (needs thinking about).
2: Metronomic tempi (e.g. crotchet = 72) and frequencies are imperceptible in the same way. Absolute time is only related to mechanics not chunked memory.