This paper was written for the Cast01
symposium at the GMD (now Fraunhofer Institute) in St Augustin, Germany, in the spring of
2001, but it was rejected without detailed comment. Probably this was partly because the Cast01
symposium had a bias towards graphic arts rather than towards music. They were not interested
in music notation.
It was later submitted to the Second International Conference on Music and Artificial Intelligence,
ICMAI02, in Edinburgh, Scotland, where it was
reviewed and accepted as a category B paper (a poster presentation).
The 2002 revision - The Writing of Style Libraries for Music Notation and Performance
- can be found here (with a few additional
comments written after the conference), the poster (PDF) is here.
I am leaving the original 2001 version of Music Notation and Agents as Performers here
at this website because I want to keep track of the history of these ideas. The 2002 revision
may be clearer in some respects, but it avoids using the word “agent” so as to
circumvent an unproductive semantic debate. However, I still think that the libraries
I am proposing can develop into increasingly “intelligent” composers’ assistants
and performers.
ji. www 17th September 2002
This paper develops my original proposal for a new kind of music authoring software [1], and shows how the application of concept encapsulation techniques
to music notation leads to a promising approach to the development of agent technologies.
The proposed software strictly separates the graphic characteristics of symbols (in space)
from their meanings (in time), and stores these in separate, interconnected, author-definable
libraries.
Account is taken of both user-defined (local) and machine-defined (default) meanings. Composers
either know the context criteria affecting the local meanings of their symbols, or
can demonstrate variations of meaning in real time (as often as required for a statistical
analysis of the context criteria).
A library containing context dependent meanings is a composer's assistant or agent,
capable of performing scores intelligently in real time.
The creation of new, elementary graphic symbols and their integration into symbol libraries
is also described.
context, space, time, symbol, meaning, agent
Writing is, in many areas of culture, a precondition for the development of complex ideas
and high level grammars. In western classical music for example, the rules of harmony and
counterpoint, and subtle traditions of performance practice, could only have developed because
that tradition is a written one. Written symbols are strictly speaking timeless, existing
only in space (the deterioration of the physical medium over time is irrelevant, because
the essential characteristics of the symbols can be copied exactly). Written cultures use
timeless symbols as the substrate in which they develop. Development is a function of time,
and needs a frame of reference.
Notational conventions can, however, become obstacles to progress. Written music is currently
in a period of stagnation because the standard symbolic notation conventions embody conceptual
errors about the relation of time to space, and because these conventions continue to provide
the conceptual framework within which composers and performers think
[2]. The errors stem from the 19th century, and their prime causes are
-
the view that time is equivalent to a dimension of space, and the consequent confusion as
to whether the notation is symbolic or analog
-
the assumption that the symbols can be given a fixed, absolute meaning
-
the ubiquity of tempo in 19th century music
In recent years, it has become very clear from the way computer programming languages have
developed, that the key to allowing authors to develop their grammars is to provide
them with software which allows them to create new symbols and to redefine the meanings of
existing ones. Authors should be allowed to define and encapsulate their concepts in any way
they like, and to build on preexisting ideas, so as not to have to start from scratch all
the time. In computing, this is currently done by providing programmers with interfaces, either
to libraries or to complete, working applications (APIs).
The symbols in standard music notation are the names of concepts which have been learned
by composers and performers (time is not just equivalent to a dimension of space), so music
notation can be thought of as an authoring or programming language whose classic interpreters
are people.
The lowest level music symbols are as small as possible (single characters or simple lines).
These symbols are combined two dimensionally to create larger, more complex symbols and a
maximum density of legible information in the two dimensions of the page. This is important
when having to read in real time, extract meanings at the highest possible level, and turn
as few pages as possible while doing so.
Current computer programming languages are based on alphanumeric text, and the symbols (the
names of objects or functions) are generally word-sized. Interestingly, such text is usually
formatted in two dimensional space so as to increase (human) legibility. Contrast this situation
with that of ordinary text, where a single string of words and punctuation is simply folded
onto the page.
It may be possible to create specialised computer programming languages, for use outside music,
in which an increased density of information is achieved because they use character-symbols
arranged two-dimensionally instead of simple word sequences. The compiler (parser, interpreter,
performer) would have to be more complicated, but the script could be smaller (faster to transmit).
Note that symbols arranged in three or more dimensions have a still higher density of information...
Because this paper describes an application in music, useful for composers and sound engineers,
I use events to exemplify the meanings of the symbols. But events are also used outside
music, and other meanings are of course possible, so a similar approach could be used in other
disciplines.
I think that local context is a concept which lies outside space and time, and that
its active application to the use of symbols, in all areas of human cognition (philosophy,
literature, painting, architecture, music, computing, mathematics, the sciences etc.), would
have far reaching consequences.
Perception is intrinsically chunked 1
(see, for example, [3]). We perceive whole objects and events,
not the raw physical data into which these can be analysed by using secondary instruments.
Pitch is experienced, but frequency (e.g. 440 Hz) is not. We can say what pitch a note has,
and how that pitch relates to other pitches, but we cannot count the vibrations. Pitch is,
in this sense, elementary in music.2
Events are chunks of otherwise amorphous temporal experience. But they are not necessarily
elementary. Many events can combine to create a single, higher level event.
Music notation is concerned with perceived events, and the lowest level graphic symbol it
uses to represent one is the chord. The simplest chord symbol consists of a single
dot, and more complex chord symbols can be created by clustering elementary symbols (in a
local context) to create complex, word-sized objects - which people can read as single
objects. Such objects can themselves be clustered (creating a higher level local context)
to make compound symbols at a still higher level. Standard music notation contains many types
of connector, such as stems, beams, slurs, barlines etc. which aid legibility by physically
(visually) binding such high level symbols together.
The smallest perceivable, two dimensional symbols are characters, and the simplest of these
is the dot. Dots are used extensively in music notation and text. Their meaning changes according
to the local, graphic context. They can be used, for example, as
-
noteheads (which combine to form chords),
-
staccato indications (above or below chords),
-
duration augmentation (to the right of chords),
and in text as
-
parts of other characters (i, j, ä, ö, ü, :, ;),
-
punctuation (.),
-
bullets (•).
Such a dot (a notehead) may represent a very complex event. For example, in organ or synthesizer
music it may represent an event which has several, programmable pitches. Depending on the
instrument, there may be an intrinsically associated dynamic envelope, a maximum possible
duration etc. The dot symbolises, or is the name for a complex of information at a
lower level. The dot means the settings at that lower level.
Figure 1 describes a further development of the music
editing/sequencing software which I originally proposed in [1]. The new proposal incorporates
windowing techniques to navigate levels of the kind described above, at any degree of granularity.
Encapsulation ensures that the inner details of events do not get in the way while one is
trying to concentrate on the grammars special to the higher levels.
Data encapsulation (using linked windows) is a concept very familiar to computer programmers
(and designers of web pages - also two-dimensional collections of symbols...), but it has
not yet been used successfully to incorporate nested levels of symbolic music notation into
music editing software (see for example [4, 5, 6, 7]). Especially
in the domain of time, the assumption that the symbols can be given absolute meanings
(like the symbols of arithmetic) has made it difficult to recognise that they are fundamentally
different from analog, space-time notations.
Music notation symbols are more like words in ordinary language. Their meanings should be
locally and globally user-definable, and users should be able to develop what they mean by
storing those meanings in libraries.
It is not difficult to see that current standard music notation does its best, in two dimensions,
to straddle more than one of the symbolic levels shown in Figure
1. (Chords, which are spelled out with several noteheads can be collapsed to use just
one; some note patterns can be more succinctly expressed using ornament signs; Roman numerals
can be used to represent chord functions; etc., and such strategies are often mixed on the
same piece of paper.)
It is to be expected that low-level symbol clustering will continue to function in the same
way on both computer windows and on paper, because both are two dimensional, but the introduction
of accessible, nested windows ought to enable the notation to develop in ways which were otherwise
unthinkable (because uncontrollable).
If this software is being used in conjunction with a synthesizer, the analog controls will
contain patch information, so deeper levels of those controls are also possible (A2, A3 etc.)
- for example to adjust the sensitivity of one of the controls in A1.
Obviously, a symbol's (e.g. notehead's) meaning is relative to the symbolic level window (SLW)
(S1, S2 etc.) which contains it.
Within an SLW all noteheads have the same parameters. Issuing an edit command for each
notehead in S1 will open a window containing the same set of controls, but with different
values for those controls. Noteheads in other SLWs have different meanings, defined
by different sets of controls (devices).
Users must therefore first decide which graphic symbols can occur in each SLW, and then associate
a set of controls (parameters) to each symbol inside each SLW. Both these procedures can be
simplified by using object libraries.
Libraries of graphic symbols contain both their shapes, and functions describing the way they
move about in space (independent of their meanings). See §3.4.
Libraries containing controls (devices) must contain not only the visual information which
allows them to be displayed on the screen and to function correctly, but also methods which
use the local context to determine their default values (the values they take on without
user intervention). More about this in §3.5.
Drag-and-drop is a well established method for adding analog control devices to windows (see
e.g. [8]), and a similar approach could be adopted to allow
users to define the control panels associated with high level symbols (e.g. the symbols in
S2).
As at the lowest level S1, it is possible to predefine a set of abstract controls with which
many S2 symbols (e.g. the trill) can be defined (interval width, number of notes, speed etc.),
and to allow such controls to be dragged and dropped by users to define new symbol control
sets. Beginners, and many other users, would not have to deal with this level of the software,
because powerful, standard symbol control definitions would already be available in libraries.
Any graphic symbol must have a defined shape, and a way of moving about in space defined with
respect to its local context.
There are, in music notation, a small number of symbol types, which define the behaviour
of the symbols in space without saying anything about their shapes. New spatial behaviours
are very rare.
For example, chord types might include OrdinaryChord, HeadlessChord, StemlessChord, Rest.
ChordComponents (element types) include: Notehead, Stem, Flag (central), Accidental (to the
left), Auxiliary (above and below), AugmentationDot (to the right).
Auxiliaries include Accent, StaccatoDot, Trill, Turn, RomanNumeral, SungText, ChordDynamic
etc.
Notice that it is never necessary to store the shape of a chord in a library, because they
are constructed ad hoc in scores, by clustering ChordComponents.
To be instantiatable, elementary symbols must have a defined shape, and these shapes must
form part of the symbols stored in the graphic library. Such shapes can be loaded from a font,
and will not usually have to be created from scratch. Music notation reuses the simplest symbols
as far as possible, and has no need for very complex elements (in this respect it is unlike
ordinary text, whose lowest level meaningful symbols are words - of which there are an unlimited
number).
So, new symbols (e.g. for accents or noteheads), can be created by subclassing an existing
type to inherit a spatial behaviour, and loading the shape from a font.
Class libraries defining the spatial behaviour of music symbols exist in my (unpublished)
Music Toolbox software [9]. Almost certainly, they also
form part of the software underlying the standard music notation programs
[10, 11].
When a user adds a symbol to a window, wanting the software to be able to perform it correctly,
the software has to take the symbol's context into account in order to generate the default
values for that symbol's known parameters.
The precise value of each individual parameter in each individual symbol is unique, and relates
both actively and passively to the symbol's context.
An analysis of the way each symbol relates to its context is a necessary part of the process
of discovering what the default values of its parameters should be.
The symbol's context may include not only the local vicinity, but also the meaning of the
same symbol in other contexts.
If the user is unhappy with the default values provided by the software, those values can
either be edited in the window provided, or demonstrated by performing them (perhaps
several times) directly on some input device such as a keyboard. Either way, the values can
then be used in the evaluation of future default values.
The system ought to be able to learn the user's style, possibly by doing an analysis of several
performances of the same notation. It should be possible to abstract a given performer's preferences,
and to store them in a library which could perform the notation independently.
In addition to demonstrating (which probably requires a complex statistical analysis to produce
interesting results), composers or performers could also, in principle, directly define
the criteria by which a context affects the meaning of a symbol. They usually know why they
perform symbols in a particular way in a particular situation. Many such insights are however
at a high structural level, and are currently difficult to formalise. Research needs to be
done in this area, but it is possible that the problem of high level, long-range criteria
may be solvable using special symbols in high level SLWs.
Many low level criteria can, however, already be fairly easily described (e.g. in many styles,
the final note of a slurred phrase has a particular dynamic envelope - which can be made a
function of the slur symbol).
Notice that the default values provided by the software must be exact, but that the rules
with which the context is evaluated may only be able to provide a range of values within which
the selected values have to occur. It is even possible to think of the performed context as
being 'inexact' because human memory decays over time. The problems can become as complicated
as one likes. A good strategy is, however, to solve the easiest problems first, and then to
see whether the more difficult ones become easier or go away altogether.
Music (as opposed, for example, to human language), is concerned with a relatively small number
of clearly definable symbol definitions, so it is likely to be a productive test-bed for context
analysis techniques. Such techniques are, I believe, essential for the development of convincing
agents.
[1] James Ingram, 2000 Perspective,
Space and Time in Music Notation
[2] James Ingram, 1985
The Notation of Time
[3] Nelson Goodman, 1976 The Languages of Art (Hackett Publishing Company
Inc.)
[4] IRCAM Music Representations
group
[5] CREATE Research White Papers (link expired)
[6] Acousmographe & GRM Tools (link expired)
[7] CCRMA
SynthBuilder
[8] SONIC CORE GmbH: Scope/SP (old link)
[9] James Ingram:
Music Toolbox
[10] Coda Music Technology: Finale
[11] Sibelius Software Ltd. , England.:
Sibelius
This diagram represents the state of my ideas in May 2001. These ideas are currently evolving
fairly quickly, so more recent versions of this diagram will be found in more recent papers.
The first version can be found in Perspective,
Space and Time in Music Notation.
An irregular space-time diagram can be resolved into a series of named chunks as follows [1, 2]:
1. Construct the trammel. Ideally, it is constructed for maximum differentiation in the transcription.
(Logarithmically constructed trammels result in conventional legibility.) There are ways to
define "maximum differentiation" so that optimal trammels can be calculated by the software,
on the basis of an analysis of the space-time diagram. Users should,
as always, be given the chance to override such suggestions.
2. Label each impulse (event) with the symbol corresponding to its length in space (in this
case the distance to the next event). Beaming (higher level chunking) has been used to further
increase legibility in this example.
3. The dynamic envelope curve and other time related information (e.g. duration in seconds,
pitch, patch etc.) can be stored in a separate data object associated with (named by) each
symbol, so the symbols themselves can subsequently be spaced for maximum legibility, independently
of that information (as in standard notation). The data associated with each symbol is editable
in the window which opens when the user issues an edit command for that symbol.
1: chunking and quantum concepts are related somehow (needs
thinking about).
2: Metronomic tempi (e.g. crotchet = 72) and frequencies are imperceptible
in the same way. Absolute time is only related to mechanics not chunked memory.