The Writing of Style Libraries for Music Notation and Performance

Written in July 2002 for
ICMAI '02
The Second International Conference on Music and Artificial Intelligence
which took place in Edinburgh, Scotland from 12th-14th September 2002.

Notes
(18th September 2002)
This paper is an update and revision of the paper
Music Notation and Agents as Performers.

The diagrams and information in Section 4 are essentially the same as those presented in
sections 2.2.3 to 2.2.6 of the paper Music Notation: Inherited Problems and a Proposed Solution.
They were included again here because they are necessary for understanding the rest of the paper
- and I was hoping to get them into print for the first time.

For further information
see the appendix written after ICMAI '02
A Preface after the Event

1 Introduction

1.1 The importance of writing
1.2 Conceptual difficulties in standard music notation
1.3 Music notation and programming languages
1.4 Implications for current music authoring software
1.5 External implications

2 Names, meanings, levels

2.1 Events and chunking
2.2 The smallest symbols and their meaning

3 The proposed software

3.1 Windows and data encapsulation
3.2 Parameters, local values and meaning
3.3 Context criteria at different levels
3.4 The programming of libraries using visual programming environments
3.5 Local values, meaning, and training "more intelligent" libraries

4 Event analysis and notation

4.1 “Chunking”
4.2 Other parameters
4.3 “Zeros” and Transformations
4.4 Legibility and other practical considerations

5 Conclusions

References

Appendix: A preface after the event

The paper's history
The poster
The process of acceptance
The point of studying AI - and having agents
Some thoughts on Artificial Intelligence

Abstract

This paper proposes a GUI for authoring music with nested music symbols. The spatial aspects of the symbols (their names and the behaviour of those names in space) are strictly separated from their temporal meanings. The spatial behaviours and temporal meanings are stored in separate functions which means that they can become increasingly subtle, and that different styles of performance can be associated with identical notation conventions.
A distinction is also made between the spatial and temporal symbol definitions (which are stored in style libraries), and their local instantiations in a particular score. Users have access to, and can edit, both the definitions (which provide default values) and particular instantiations of the symbols.
More "intelligent" libraries can be trained by directly demonstrating variations of local meaning. The sharing and independent development of these libraries by different users implies that written traditions of performance practice are possible.

1 Introduction

1.1 The importance of writing

Writing is, in many areas of culture, a precondition for the development of complex ideas. In western classical music for example, the rules of harmony and counterpoint could only have developed because that tradition is a written one. Written symbols are strictly speaking timeless, existing only in space (the deterioration of the physical medium over time is irrelevant, because the essential characteristics of the symbols can be copied exactly). Written cultures use timeless symbols as the substrate in which they develop. Development is a function of time, and needs a frame of reference.
Traditions of performance practice (or style) have however always been transmitted aurally, even in written music (Tappolet 1967, [1]). Such traditions are now being suported by the use of audio recordings. The development of many 20th century aural traditions (e.g. special forms of jazz) and the recent redevelopment of performance practices for Early Music would have been unthinkable without the timeless framework provided by recordings.
Developing new styles of notation and performance in newly written music is currently very difficult because the aural communication of performance style requires a great deal of expensive rehearsal time. As with Early Music, recordings could be used to alleviate the problem, but styles of new music recordings are not usually characteristic enough to be useful in rehearsals. Standard music notation comes with a standard performance style. The situation is also made more difficult because standard western music notation is currently in a state of conceptual confusion - especially with regard to performance style.

1.2 Conceptual difficulties in standard music notation

Since at least the beginning of the 19th century, standard music notation requires that the duration symbols in parallel voices have to "add up" within a bar. The values used in this addition are found by combining a basic value for each duration symbol with any modifiers (tuplets). Each (modified) duration symbol has, in principle, a fixed value which is equivalent to a segment of absolute time (a number of seconds). Notice that the durations which are added here have no clear relation to the actual durations which occur in real performances. "Style", "expressivity" and "Performance practice" are deliberately ignored and are left unclearly defined.
The convention that the symbols can be "added up" or "subdivided" is meaningless without a perceptible tempo of reference which allows the durations to be predicted (Ingram 1985, [2]). It is, however, often difficult to decide whether the durations are predictable or not in a particular piece of music. The situation is especially critical in slow, late 19th century, Romantic music (Ingram 2002, [3]).
Music does not necessarily have to have a tempo, and if there is none the standard notation conventions are meaningless. The nineteenth century practice of notating absolute time (or a piece of clockwork) from which performances differ by means of "expressivity" broke down as perceptible tempi disappeared. Note that "style" and "expressivity" are not restricted to music containing predictable durations.

The notation of tempoless music requires a more general set of concepts.

Concerted attempts to reform music notation (especially by the Avant-Garde ca. 1950-70) failed because the standard duration symbols (objects in space) were bound tightly to "exact", addable, subdividable meanings. Other notations were conceived, in contrast, to be somehow "free". (See, for example, Stockhausen 1956 [4], Boulez 1963 [5], Karkoschka 1966 [6].) This meant that ad hoc solutions to notation problems were encouraged, preventing any real conceptual progress in this area.
If, unlike Romantics, Neo-Classicists and 20th century Avant-Garde composers, we take (different traditions of) performance practice into account, it is clear that different kinds of music can be written using the same symbols and graphic conventions. It is not the case that the standard duration symbols always mean the same thing - either globally across different pieces of music, or locally within the same piece of music.
If the spatial and temporal definitions of symbols are cleanly separated, then graphic conventions can be developed to maximise legibility , while their meanings are free to be developed separately.

1.3 Music notation and programming languages

Music symbols are the names of concepts which have been learned by composers and performers (time is not absolute), so music notation can be thought of as an authoring or programming language whose classic interpreters are people. Many of the lessons learned during the development of computer languages are therefore applicable to music notation.
In computer programming languages for example, the distinction between a symbol's graphic appearance (its name) and the definition of what that name means leads to the hierarchic nesting of symbols. (Note that computer programs are composed on volatile screens, in environments which rely heavily on windowing to navigate those levels of information.)
It has also become very clear in computing, that the key to allowing authors to develop their ideas is to provide them with software which allows them to create new symbols and to redefine the meanings of existing ones. Authors should be allowed to define and encapsulate their concepts in any way they like, and to build on preexisting ideas, so as not to have to start from scratch all the time. This is currently done by providing programmers with interfaces, either to libraries or to complete, working applications.

1.4 Implications for current music authoring software

The lack of development in the concepts underlying the standard music symbols means that advanced authoring software is currently very dependent on analog "space-time" representations - especially for the notation of tempoless music (See, for example, the user interfaces of ProTools [7], OpenMusic [8], AudioSculpt [9], Acousmographe [10] and SynthBuilder [11].) It should be possible to enhance the utility of such programs by layering libraries of music symbols and their meanings on top of the analog, machine-oriented representations. Where an event-oriented approach is practicable, there may also be ways to integrate the existing controls in such programs (knobs, sliders etc.) meaningfully into window hierarchies containing high level music symbols.

1.5 External implications

Standard music notation, which evolved for use on two dimensional paper, demonstrates many important ways in which the density of legible information can be increased on a page. (Musicians have always needed legible sheet music, with as few page turns as possible.) It combines the smallest possible symbols (single dots, other characters, single lines) two dimensionally to create larger, more complex objects (Ingram 2000, [12]). It also makes extensive use of symbol overloading (see §2.2) and shorthand symbols (such as the trill) to reduce the amount of space needed for higher level events.
Current computer programming languages are based on one-dimensional alphanumeric text strings containing characters and word-sized symbols (the names of objects or functions). Interestingly, such text is usually formatted in two dimensional space so as to increase (human) legibility. Contrast this situation with that of ordinary text, where a single string of words and punctuation is simply folded onto the page.
It may be possible to create specialised computer programming languages, for use outside music, in which an increased density of information is achieved because they use character-symbols arranged two-dimensionally. The compiler (parser, interpreter, performer) would have to be more complicated, but the script could be smaller (faster to transmit). Note that symbols arranged in three dimensions (as in proteins) have a still higher density of information.
Because this paper describes proposals for a music application, useful for composers and sound engineers, events are used to exemplify the meanings of the symbols. But events also occur outside music and other meanings are of course possible. The proposals here might, for example, be useful in the development of expressive speech for automata.

2 Names, meanings, levels

2.1 Events and chunking

Perception is intrinsically chunked (Goodman 1976 [13]). We perceive whole objects and events, not the raw physical data into which these can be analysed by using secondary instruments. Pitch is experienced, but frequency (e.g. 440 Hz) is not. We can say what pitch a note has, and how that pitch relates to other pitches, but we cannot count the vibrations. Pitch is, in this sense, elementary in music.
Events are chunks of otherwise amorphous temporal experience. But they are not necessarily elementary. Many events can combine to create a single, higher level event.
Music notation is concerned with perceived events, and the lowest level graphic symbol it uses to represent one is the chord. The simplest chord symbol consists of a single dot, and more complex chord symbols can be created by clustering elementary symbols (in a local context) to create complex, word-sized objects - which people can read as single objects. Such objects can themselves be clustered (creating a higher level local context) to make compound symbols at a still higher level. Standard music notation contains many types of connector, such as stems, beams, slurs, barlines etc. which aid legibility by physically (visually) binding such high level symbols together (Ingram 2000, [12]).
“Local context” is a key concept here. A local context is a group of symbols whose combined meaning can be represented by a symbol at a higher level. This concept is related to "scope" in computer programming languages. Modifiers such as accidentals or staccato dots only affect symbols within their local context. In music notation, local contexts are delimited by spatial proximity, the use of different kinds of connector (slurs, brackets, beams etc.) and the existence of similar contexts in the neighbourhood.

2.2 The smallest symbols and their meaning

The smallest perceivable, two dimensional symbols are characters, and the simplest of these is the dot. Dots are used extensively in music notation and text. Their meaning changes according to the local, graphic context. They can be used, for example, as

noteheads (which combine to form chords),
staccato indications (above or below chords),
duration augmentation (to the right of chords),
in text as
- parts of other characters (i, j, ä, ö, ü, :, ;),
- punctuation (.),
- bullets.

Such a dot (a notehead) may represent a very complex event. For example, in organ or synthesizer music it may represent an event which has several, programmable pitches. Depending on the instrument, there may be an intrinsically associated dynamic envelope, a maximum possible duration etc. The dot symbolizes, or is the name for, a complex of information at a lower level. The dot means the settings at that lower level.

3 The proposed software

Fig. 1 A proposed editor for developing music

Remarks about Fig.1
1. The user is a feedback mechanism changing space into time and vice versa.
2. Event creation is user output (using some kind of instrument such as a microphone, keyboard, algorithmic synthesis etc.). Event perception is user input (i.e. listening).
3. Object creation is user output (writing, i.e. editing in the GUI windows). Object perception is user input (i.e. reading).
4. Windows S1, S2 etc are Symbolic Level Windows (SLWs) containing symbols which are the names of lower level windows. Windows A1, A2 etc. are Analog Level Windows (ALWs) containing analog controls and settings which can be used to synthesise a particular performance.
5. Event analysis is the process of chunking the raw data (see §4). Event synthesis is the process of creating real events from the data stored in the score. The user could also create events by performing one of the SLWs live, ignoring any temporal information currently stored in the score. The user would have to use knowledge of a performance style in order to do this.
6. The libraries for the Analog Level Windows contain the controls used when creating the events. These may be either patch controls (for synthesizers), links to other event-oriented authoring software, or analog to sampler controls or the verbal instructions given to performers.
7. This diagram is the same as the one in the paper Inherited Problems and a Proposed Solution. For ease of use on screen, it contains a few more annotations than the one designed for the poster. [2009: See below.]

3.1 Windows and data encapsulation

Fig. 1 describes proposals for the global architecture of an editor for developing music. As in the computer programming environments mentioned above, the GUI uses windowing techniques to navigate levels of information. Encapsulation ensures that the inner details of events do not get in the way while one is trying to concentrate on the relations between the symbols at higher levels.
Notice that none of the symbols in the GUI has a fixed absolute value. Local values are user-editable, and all default meanings are defined in library functions. The arguments to those functions are in the same windows (at the same level) as the function names (the symbol names) and form part of their local context. As with the words of ordinary language, the meanings of music symbols change according to context.
In this authoring environment, users can access and change both the global definitions of the symbols they are using (stored in the libraries), and the local values for a particular performance (stored in the instantiated symbols in the GUI).
It is not difficult to see that current standard music notation does its best, in two dimensions, to straddle more than one of the symbolic levels shown in Figure 1. (Chords, which are spelled out with several noteheads can be collapsed to use just one; some note patterns can be more succinctly expressed using ornament signs; Roman numerals can be used to represent chord functions; etc., and such strategies are often mixed on the same piece of paper.)
It is to be expected that low-level symbol clustering will continue to function in the same way on both computer windows and on paper, because both are two dimensional, but the introduction of accessible, nested windows ought to enable the notation to develop in ways which were otherwise unthinkable (because uncontrollable).
If this software were being used in conjunction with a synthesizer, the analog controls in A1 would contain patch information. Deeper levels of those controls are also possible (A2, A3 etc.) - for example to adjust the sensitivity of one of the controls in A1 (see also §1.4). Levels above and beyond S2 could be defined for the composition and analysis of very high level musical events. One could, in principle, do Schenker analysis or describe compositions in other ways at these levels.

3.2 Parameters, local values and meaning

Obviously, a symbol's (e.g. notehead's) meaning is relative to the Symbolic Level Window (SLW) (S1, S2 etc.) which contains it.
Within an SLW all noteheads have the same parameters. Issuing an edit command for each notehead in S1 will open a window containing the same set of controls, but with different values for those controls. Noteheads in other SLWs have different meanings, defined by different sets of controls (devices).

3.3 Context criteria at different levels

Users know a great deal about why they perform particular symbols in particular ways in particular situations in particular styles. Many such insights are however at a high structural level, and are currently difficult to formalise. It should however be possible to define high level, long-range criteria using special symbols in high level SLWs. Many low level criteria can, however, already be fairly easily described (for example, in many styles, the final note of a slurred phrase tends to have a particular dynamic envelope).

3.4 The programming of libraries using visual programming environments

Users of this software would normally load their (spatial and temporal) symbol definitions from a library, so beginners and many other users would not have to think about programming these. The definitions should however be accessible and fairly easily programmable using a visual programming environment.
Visual programming environments (like the one used in IRCAM's OpenMusic [8] to program "Maquettes") use icons and connecting lines to construct intuitively usable, user-accessible control structures similar to those which are necessary here.
Programming the spatial behaviour of graphic objects: Any symbol must have a defined shape, and a way of moving about in space defined with respect to its local context. Standard music notation programs (e.g. Finale [14] and Sibelius [15]) routinely keep symbols in linked lists and similar data structures so as to simplify editing and redrawing. These structures, which are easy for users to understand and manipulate, describe each symbol's local context.
There are, in music notation, a small number of symbol types, which define the spatial behaviour of the symbols independently of their shapes. New spatial behaviours are very rare.
Elementary symbol types, which are represented by characters and simple lines, are clustered to produce compound symbol types such as chords.
New, elementary symbols (e.g. for accents or noteheads), can be created by subclassing an existing type to inherit a spatial behaviour, and loading a new shape from a font. Notice that, as with the temporal meanings, the position of a newly instantiated symbol (its local value) is initially the default position described by the function in the library, but that this local value must be editable by the user (for example by dragging or nudging with the arrow keys).
The separation of meaning from spatial behaviour makes it unlikely that radically new kinds of graphic symbols will be necessary. The graphic libraries can, for the same reason, begin with a small set of simple, well understood, highly legible symbols, whose evolution can be expected to slow down fairly quickly.
Programming the temporal meaning of symbols: The temporal meanings of the symbols can be programmed by linking icons representing the relevant symbols in the local context. As has been done in OpenMusic, it is possible to predefine a set of abstract controls with which many symbols can be defined in terms of others (interval width, number of notes, speed, envelope etc.), and to allow such controls to be dragged and dropped by users to define new symbol control sets. A trill, for example, might have a control set including initial pitch, speed, interval width, final turn. The initial pitch would be related to the pitch of the notehead in the trill's local context. This definition of a trill defines part of the temporal style.
There may be unexpected consequences here: AI research into performance practice and expressivity in old styles may suggest control structures which could be used to define the temporal meanings of symbols. The resuscitation of dormant written traditions, and the writing of more expressive New Music may be the result.

3.5 Local values, meaning, and training "more intelligent" libraries

When a user adds a symbol to an SLW, wanting the software to be able to perform it correctly, the software has to take that symbol's context into account in order to generate the default values for its known parameters.
The precise value of each individual parameter in each individual symbol is unique, and can relate both actively and passively to the symbol's local context in ways defined in the library. Understanding the "correctness" of the default value, at least intuitively, is to recognise the style defined in the library.
Notice that because the default values of particular instantiations of symbols are discovered by performing calls to functions, there are no limits to the complexity of the spatial or temporal style. Libraries may initially use very simple procedures, but become more complicated later. There is no reason why libraries with a recognisable style, but whose inner workings can only be understood by a few experts, should not develop. Such libraries could be said to be "more intelligent".
The means by which users change the settings in the libraries is independent of the complexity of the library. Sophisticated controls may be used even in simple libraries. For example, default values could be related to some chance operation - perhaps within some controllable Gaussian distribution. The parameters of such a Gaussian distribution could be controlled directly or inferred by the library from a series of demonstrations of correct values by the user.
Some libraries might learn a user's preferences by observing how that user sets or changes particular values in the score. This would be especially important if the library is "more intelligent", and cannot be easily understood or directly edited by the user.

4 Event analysis and notation

4.1 “Chunking”

This is the process whereby each event in a series of events (represented in a space-time diagram) is given a separate symbol (or name), and the event's name is connected to the space-time information at a lower level. Consider the following space-time diagram:

Fig. 2 Irregular series of events (space-time diagram)

As far as a machine is concerned, this is a single, undifferentiated curve. People however instinctively break such curves into manageable chunks. Such chunks can be labeled just by putting a dot on each peak (the dot might be oval, like a notehead). Alternatively, the labels could be numbers or letters or duration symbols etc. giving more precise information about the event.
The lengths of the "events" can be classified, irrespective of the existence of a tempo, using a logarithmically constructed trammel. Using the classic duration symbols means that legibility can be improved later (horizontal spatial compression, use of beams), and it becomes easy to develop closely related higher level notations.

Fig. 3 Trammel construction

It would be useful, if standard notation of tempoed music could be a special case here. Standard notation has evolved to be very legible, so it would be a pity to throw away that advantage. A histogram can always be constructed from the lengths of the events (for example by first sorting the lengths into ascending order), so if the diagram contained lengths having proportions 2:1 (as in classical music without triplets), then it would be very easy to construct a trammel to produce a transcription similar to classical notation. If there are no such proportions in the original diagram, the user might relate the trammel to the shortest length, or try to ensure maximum differentiation in the resulting transcription. In any case, the user should have control over the trammel and the transcription.

Fig. 4 Transcription of duration symbols

Space is being used here to demonstrate the algorithm, but non-dimensional numbers (or bits in a MIDI stream) would also work. Note that beaming (which has been used freely here) improves legibility, and has no other function as far as this transcription is concerned.

4.2 Other parameters

The use of trammels is generalisable for other parameters. Consider the following:

Fig. 5 Transcription of durations, pitches and dynamics

In addition to using the durations trammel (as previously), this transcription has been made with a trammel for "dynamics" (the height of each event, see above left) and a trammel for "pitches" (the colour of the event, see below).

Fig. 6 Trammel for pitches
(The grayscale from black to white is supposed to be continuous.)

Interestingly, the perception of equal steps in both pitch and dynamic is related to logarithmic steps at the machine level (both the vertical scale and the grayscale in the above diagrams should be considered logarithmic).
The "pitch" symbols are purely arbitrary here (e.g. alphanumeric symbols could have been used, and/or the grayscale might have denoted some other parameter - e.g. synthesizer patch). This has been done to make it clear that there is but a short step from here to something like standard notation - and again, as much legibility is being preserved as possible...
Once the actual values have been chunked and given a label (symbol), the windows S1 and A1 can be completed in the score (GUI). The A1 windows contain the actual, precise values taken from the machine level of the original events. No information is lost.
It is quite conceivable, that more complicated symbols could be similarly defined at this stage - for example staccato dots and accents, classifying particular envelope forms.
All these trammels (or their numeric equivalents) connect symbols with their generalised meaning, and are stored in the library S1 together with the definitions for how each symbol moves about in space. (The set of nested libraries is, of course, a software module which can be selected and changed by users.)

4.3 "Zeros" and Transformations

The character set for event lengths (flags etc.) needs to be complemented by a set of symbols for vacant spaces between events. The traditional symbols for rests would seem to be the logical choice (legibility preservation).
Many parameters may have symbols for transformation. But this is not true of durations. Durations already have a time component, so they cant transform - there is no such thing as an event whose length changes (!)... Transformation symbols for other parameters may include

diminuendo and crescendo hairpins for dynamics
glissando lines for pitch
general purpose arrows etc.

4.4 Legibility and other practical considerations

Here is the transcription from §4.2 again:

Fig. 7 A raw transcription

Legibility can be improved, and a higher density of information achieved, by:

moving the symbols closer together horizontally
omitting repeated dynamics
adding slanted beams to create groups

Fig. 8 A transcription with legibility improvements

Group symbols such as beams and omitted dynamics might be defined for the second level symbolic window (S2).
The "pitch" characters are not necessarily related to pitch - they can also be used for other parameters. This would reduce the number of symbols whose spatial behaviour has to be defined. Such symbols have abstract uses - they could, for example be used as general-purpose slider knobs. Possibly alternative representations for each parameter should be available (e.g. dynamics with the traditional symbols or as noteheads)

Fig. 9 Symbol overloading

Any symbolic level window can contain many, parallel, multi-parameter tracks. Notice the advantage of using staves and legerlines rather than putting all the parallel tracks on top of each other on the same graph. In this example, only the stafflines and ledgerlines have been used for the eight standard dynamics. MIDI velocities might be chunked with a smaller granularity - using spaces and/or more ledgerlines. There should be a form of "clef" for each staff, indicating the range of values notated. The view might have two modes: either space-time or horizontally compressed for maximum information per window.

5 Conclusions

The sharing and independent development of style libraries (for both notation and performence) by different users would allow written traditions of performance style to exist for written music. (Software is a form of writing.) This would be a radical change from the current position: a temporal style would be as easy (or as difficult) to learn and develop as a graphic style. Currently, temporal styles have to be learned during expensive rehearsals with two or more people. The use of temporal style libraries, stored in software, would enable individuals to learn a new style in their own time, making group rehearsals more productive.

References

1. Tappolet, W.: Notenschrift und Musizieren. Robert Lienau, Berlin-Lichterfelde (1967)
2. Ingram J.: The Notation of Time. Contact Magazine, London (1985).
3. Ingram J.: Inherited Problems and a Proposed Solution. Ynez lecture at the University of California, Santa Barbara (2002).
4. Stockhausen K.: ...wie die Zeit vergeht... in Texte zur Musik Band 1 (DuMont, Cologne 1963); Also in Die Reihe #3 - Eng. trans.by Cardew, C. as ...how time passes... (1959)
5. Boulez P.: Penser la musique aujourd'hui (Paris 1963); English translation by Bradshaw, S. and Bennet, R. R. as Boulez on Music Today (London: Faber and Faber, 1971)
6. Karkoshka, E.: Das Schriftbild der Neuen Musik . Moeck Verlag, Celle, Germany (1966). English translation by Ruth Koenig as Notation in New Music , (Universal Edition, 1972)
7. Digidesign.: ProTools
8. IRCAM Music Representations group.: OpenMusic
9. IRCAM Music Representations group.: AudioSculpt
10. INA-GRM: Acousmographe & GRM Tools (link expired)
11. CCRMA: SynthBuilder
12. Ingram J.: Perspective, Space and Time in Music Notation. Proceedings of The 12th Annual Conference on Systems Research, Informatics, and Cybernetics, Germany (2000)
13. Goodman, N.:The Languages of Art Hackett Publishing Company Inc. (1976)
14. Coda Music Technology: Finale
15. Sibelius Software Ltd. , England.: Sibelius

A Preface after the Event

This “preface” was written from 17th-22nd September 2002, on returning from the conference in Edinburgh for which this paper was written. It does not form part of the paper itself, but contains some information about the paper's context, and some further thoughts on the conference, the conference's context and Artificial Intelligence.

The paper's history

This paper is a revision of Music Notation and Agents as Performers (2001), which was originally written for the Cast01 symposium at the GMD (now Fraunhofer Institute) in St Augustin, Germany. The Cast01 organisers rejected the paper without detailed comment. Probably this was partly because they were more interested in the graphic arts than in music. They were not interested enough in notation.
Since none of the ideas presented are in print, I was allowed to submit a revised version to the ICMAI'02 conference, and it was accepted as a category B paper (a poster presentation). The paper was not presented to the general conference, but I was able to talk to some interested individuals about the poster.

[2009: the poster was originally published in the ICMAI'02 Additional Proceedings, but these seem no longer to be accessible. The link from the above ICMAI'02 page is broken, and so is the webmaster's contact information. I have therefore put the poster (PDF) here.]

The poster

The poster I took to Edinburgh was of course a compressed version of the paper. For illustration purposes, it additionally contained a section about the current state of a transcription project I am working on for Curtis Roads (his Sonal Atoms ). There would have been no space in the paper for this example, and in any case the ICMAI deadline was well before the Sonal Atoms project began. In some ways the poster is easier to grasp than the paper, because the ideas did not have to be presented sequentially. I expect to complete the Sonal Atoms transcription in the near future, and then to write a paper about it.

The process of acceptance (context)

The Writing of Style Libraries for Music Notation and Performance takes account both of my own progress since the original paper was written, and the ICMAI'02 reviewers' comments (I did my best to add references, and to remove any other misunderstandings). In particular, the revision avoids the word "agent" so as to circumvent an unproductive semantic debate. By “agent” I mean a piece of software which can be (and needs to be) trained over a period of time - in this case to reproduce the fine details of a particular notation and/or performance style. The level of detail achieved is a function of time, and can go beyond some (all?) users' expertees or comprehension. Something similar happened with chess-playing programs (though probably not all of these were trained as "agents" in the sense I am using here). I still think that the libraries I am proposing can develop in this way into increasingly “intelligent” music copyists (transcribers) and performers.
Interestingly, the ICMAI reviewers disagreed as to the merits of this paper. At least one of them thought I was a student trying to bite off more than he could chew. This was, I think, because they were denied a frame of reference for the words they were reading. I am actually rather an unlikely person from their point of view, so it was difficult for them to make sense of what they were reading. They had no way of knowing how old I am, or indeed anything else about my background, so they didnt see that these are in fact simple, elegant answers to a very large complex of problems about which I have in fact had time to think... Their difficulties were made worse because I did not originally provide enough references - incidentally reinforcing their conviction that I must be a beginner. Practically none of what I have written can be found in the printed academic literature. (It can all be found here at this web site.) While I hope I have learned my lesson about the importance of providing references, the problem seems for the moment to be self perpetuating. This paper has also not made its way into the printed conference proceedings. It is not the first time that a paper of mine has failed to find its way into academic print. Something very similar happened to The Notation of Time in the early 1980s.
Notice that abstract music notation is often treated by academics as a subject which is too big to tackle. It is indeed a subject requiring a broad interdisciplinary experience which cannot be acquired during a few years at a university. Even after leaving university, most careers tend to lead people away from dealing with symbols per se, so that they forget about their graphic aspects. In my case, this has not happened.
At the beginning of ICMAI'02, it was announced that the authors of the two best category B papers would be asked to present their work to the conference. So I was hoping that I had done enough to clarify the paper's context, and that it might after all be possible to transcend the difficulties outlined above. Unfortunately, this did not happen. For some reason, none of the category B papers were presented. The best I could do was to go through the poster with some interested individuals. This is not the easiest way to spread ideas - one gets very tired repeating the same things over and over again within a short space of time - and when those things are multi-valent, one forgets how the particular conversation has developed, and is afraid of repeating oneself, so one leaves important things out...

The point of studying AI - and having agents

Academics work in a community having a rather special sociological structure. It is therefore easy for them to ignore or misjudge the effect which this structure has on the way their techniques and theories develop. Also, it is easy for them to ignore or misjudge the ability of their work to survive when transplanted into completely different sociological frameworks (for example, that of artists or professional musicians). Many of the papers delivered at the conference suffered from these kinds of problem. The result was that some of the best speakers were asking themselves why they were doing what they were doing.
Interestingly, the justification from pure curiosity can be applied to both the "Pure Sciences" and the "Arts". It is not an argument for the establishment of a particular "purely scientific" or "purely artistic" university department. For reasons I give below, I think that AI studies must be intrinsically interdisciplinary, and that any attempt by the academic community to constrain them, must be related to the sociological environments inherent to university departments.
In my own sociological environment, there are very practical reasons for reaping the benefits of AI research. My answer to the "Why?" question is that I would like to have an agent which I could train to help me with my job. Having a personal "agent" would help me survive. My agent-student ought to save me time on routine day-to-day work so that I would have more time for getting on with other, even more complex and enjoyable tasks (like listening and performing, thinking, reading and writing). An agent could be reproduced mechanically (its only a piece of software), so the amount of work it could do simultaneously would be unlimited (which would be good for my bank balance).... And it might survive a little longer than me too...
Another, less egocentric, reason is that I would like to see (and hear) traditions of written music developing again. Commonly available, trainable, developing agents would be such traditions.

Some thoughts on Artificial Intelligence

I think that intelligence is about dealing with complexity.
AI investigates the strategies we use for simplifying that complexity (hierarchic symbol organisation, reasoning with insufficient data etc.). Successful AI techniques simplify the complexity out of existence, so the frame of reference changes. When something has been simplified enough for it to be programmed into a computer, we tend to think that the word "intelligent" is no longer appropriate. This means that AI has to be intrinsically interdisciplinary. When the goalposts move, they take no account of the boundaries of academic disciplines. One has to follow them to the next tractable problem, wherever that might be...
So the best workers in AI maintain direct contacts with complex realities outside their paid work (for example by being amateur musicians). Maybe the maintenance of extra-disciplinary contacts (especially to the "Arts") should even be a condition of employment in an AI job. But how could one define such "external contacts" in a legal contract? Maybe AI researchers should not be allowed to "work-to-rule"... There has to be a certain anarchy...
At any rate, AI researchers who combine reductionist ("scientific") and non-reductionist ("artistic") strategies for dealing with complexity are probably more likely to survive . They are less likely to run out of problems to solve. Maybe the institutional stresses are already leading to a solution of the “Two Cultures” problem.