This paper proposes a GUI for authoring music with nested music symbols. The spatial aspects
of the symbols (their names and the behaviour of those names in space) are strictly
separated from their temporal meanings. The spatial behaviours and temporal meanings are stored
in separate functions which means that they can become increasingly subtle, and that
different styles of performance can be associated with identical notation conventions.
A distinction is also made between the spatial and temporal symbol definitions (which
are stored in style libraries), and their local instantiations in a particular score.
Users have access to, and can edit, both the definitions (which provide default values) and
particular instantiations of the symbols.
More "intelligent" libraries can be trained by directly demonstrating variations of local
meaning. The sharing and independent development of these libraries by different users implies
that written traditions of performance practice are possible.
Writing is, in many areas of culture, a precondition for the development of complex ideas.
In western classical music for example, the rules of harmony and counterpoint could only have
developed because that tradition is a written one. Written symbols are strictly speaking timeless,
existing only in space (the deterioration of the physical medium over time is irrelevant,
because the essential characteristics of the symbols can be copied exactly). Written cultures
use timeless symbols as the substrate in which they develop. Development is a function of
time, and needs a frame of reference.
Traditions of performance practice (or style) have however always been transmitted aurally,
even in written music (Tappolet 1967, [1]). Such traditions
are now being suported by the use of audio recordings. The development of many 20th century
aural traditions (e.g. special forms of jazz) and the recent redevelopment of performance
practices for Early Music would have been unthinkable without the timeless framework provided
by recordings.
Developing new styles of notation and performance in newly written music is currently very
difficult because the aural communication of performance style requires a great deal of expensive
rehearsal time. As with Early Music, recordings could be used to alleviate the problem, but
styles of new music recordings are not usually characteristic enough to be useful in rehearsals.
Standard music notation comes with a standard performance style. The situation is also made
more difficult because standard western music notation is currently in a state of conceptual
confusion - especially with regard to performance style.
Since at least the beginning of the 19th century, standard music notation requires that the
duration symbols in parallel voices have to "add up" within a bar. The values used in this
addition are found by combining a basic value for each duration symbol with any modifiers
(tuplets). Each (modified) duration symbol has, in principle, a fixed value which is equivalent
to a segment of absolute time (a number of seconds). Notice that the durations which are added
here have no clear relation to the actual durations which occur in real performances. "Style",
"expressivity" and "Performance practice" are deliberately ignored and are left unclearly
defined.
The convention that the symbols can be "added up" or "subdivided" is meaningless without a
perceptible tempo of reference which allows the durations to be predicted (Ingram
1985, [2]). It is, however, often difficult to decide whether the durations are predictable
or not in a particular piece of music. The situation is especially critical in slow, late
19th century, Romantic music (Ingram 2002, [3]).
Music does not necessarily have to have a tempo, and if there is none the standard notation
conventions are meaningless. The nineteenth century practice of notating absolute time (or
a piece of clockwork) from which performances differ by means of "expressivity" broke down
as perceptible tempi disappeared. Note that "style" and "expressivity" are not restricted
to music containing predictable durations.
The notation of tempoless music requires a more general set of concepts.
Concerted attempts to reform music notation (especially by the Avant-Garde ca. 1950-70) failed
because the standard duration symbols (objects in space) were bound tightly to "exact", addable,
subdividable meanings. Other notations were conceived, in contrast, to be somehow "free".
(See, for example, Stockhausen 1956 [4],
Boulez 1963 [5], Karkoschka 1966 [6].) This meant
that ad hoc solutions to notation problems were encouraged, preventing any real conceptual
progress in this area.
If, unlike Romantics, Neo-Classicists and 20th century Avant-Garde composers, we take (different
traditions of) performance practice into account, it is clear that different kinds of music
can be written using the same symbols and graphic conventions. It is not the case that the
standard duration symbols always mean the same thing - either globally across different pieces
of music, or locally within the same piece of music.
If the spatial and temporal definitions of symbols are cleanly separated, then graphic conventions
can be developed to maximise legibility , while their meanings are free to be
developed separately.
Music symbols are the names of concepts which have been learned by composers and performers
(time is not absolute), so music notation can be thought of as an authoring or programming
language whose classic interpreters are people. Many of the lessons learned during the development
of computer languages are therefore applicable to music notation.
In computer programming languages for example, the distinction between a symbol's graphic
appearance (its name) and the definition of what that name means leads to the hierarchic nesting
of symbols. (Note that computer programs are composed on volatile screens, in environments
which rely heavily on windowing to navigate those levels of information.)
It has also become very clear in computing, that the key to allowing authors to develop their
ideas is to provide them with software which allows them to create new symbols and to redefine
the meanings of existing ones. Authors should be allowed to define and encapsulate their concepts
in any way they like, and to build on preexisting ideas, so as not to have to start from scratch
all the time. This is currently done by providing programmers with interfaces, either to libraries
or to complete, working applications.
The lack of development in the concepts underlying the standard music symbols means that advanced
authoring software is currently very dependent on analog "space-time" representations - especially
for the notation of tempoless music (See, for example, the user interfaces of ProTools [7], OpenMusic [8], AudioSculpt [9], Acousmographe [10]
and SynthBuilder [11].) It should be possible to enhance
the utility of such programs by layering libraries of music symbols and their meanings on
top of the analog, machine-oriented representations. Where an event-oriented approach is practicable,
there may also be ways to integrate the existing controls in such programs (knobs, sliders
etc.) meaningfully into window hierarchies containing high level music symbols.
Standard music notation, which evolved for use on two dimensional paper, demonstrates many
important ways in which the density of legible information can be increased on a page. (Musicians
have always needed legible sheet music, with as few page turns as possible.) It combines the
smallest possible symbols (single dots, other characters, single lines) two dimensionally
to create larger, more complex objects (Ingram 2000, [12]).
It also makes extensive use of symbol overloading (see §2.2) and shorthand symbols (such
as the trill) to reduce the amount of space needed for higher level events.
Current computer programming languages are based on one-dimensional alphanumeric text strings
containing characters and word-sized symbols (the names of objects or functions). Interestingly,
such text is usually formatted in two dimensional space so as to increase (human) legibility.
Contrast this situation with that of ordinary text, where a single string of words and punctuation
is simply folded onto the page.
It may be possible to create specialised computer programming languages, for use outside music,
in which an increased density of information is achieved because they use character-symbols
arranged two-dimensionally. The compiler (parser, interpreter, performer) would have to be
more complicated, but the script could be smaller (faster to transmit). Note that symbols
arranged in three dimensions (as in proteins) have a still higher density of information.
Because this paper describes proposals for a music application, useful for composers and sound
engineers, events are used to exemplify the meanings of the symbols. But events also occur
outside music and other meanings are of course possible. The proposals here might, for example,
be useful in the development of expressive speech for automata.
Perception is intrinsically chunked (Goodman 1976 [13]).
We perceive whole objects and events, not the raw physical data into which these can be analysed
by using secondary instruments. Pitch is experienced, but frequency (e.g. 440 Hz) is not.
We can say what pitch a note has, and how that pitch relates to other pitches, but we cannot
count the vibrations. Pitch is, in this sense, elementary in music.
Events are chunks of otherwise amorphous temporal experience. But they are not necessarily
elementary. Many events can combine to create a single, higher level event.
Music notation is concerned with perceived events, and the lowest level graphic symbol it
uses to represent one is the chord. The simplest chord symbol consists of a single dot, and
more complex chord symbols can be created by clustering elementary symbols (in a local context)
to create complex, word-sized objects - which people can read as single objects. Such objects
can themselves be clustered (creating a higher level local context) to make compound symbols
at a still higher level. Standard music notation contains many types of connector, such as
stems, beams, slurs, barlines etc. which aid legibility by physically (visually) binding such
high level symbols together (Ingram 2000, [12]).
“Local context” is a key concept here. A local context is a group of symbols whose
combined meaning can be represented by a symbol at a higher level. This concept is related
to "scope" in computer programming languages. Modifiers such as accidentals or staccato dots
only affect symbols within their local context. In music notation, local contexts are delimited
by spatial proximity, the use of different kinds of connector (slurs, brackets, beams etc.)
and the existence of similar contexts in the neighbourhood.
The smallest perceivable, two dimensional symbols are characters, and the simplest of these
is the dot. Dots are used extensively in music notation and text. Their meaning changes according
to the local, graphic context. They can be used, for example, as
-
noteheads (which combine to form chords),
-
staccato indications (above or below chords),
-
duration augmentation (to the right of chords),
-
in text as
-
parts of other characters (i, j, ä, ö, ü, :, ;),
-
punctuation (.),
-
bullets.
Such a dot (a notehead) may represent a very complex event. For example, in organ or synthesizer
music it may represent an event which has several, programmable pitches. Depending on the
instrument, there may be an intrinsically associated dynamic envelope, a maximum possible
duration etc. The dot symbolizes, or is the name for, a complex of information at a lower
level. The dot means the settings at that lower level.
Fig. 1 A proposed editor for developing music
Remarks about Fig.1
1. The user is a feedback mechanism changing space into time and vice versa.
2. Event creation is user output (using some kind of instrument such as a microphone, keyboard,
algorithmic synthesis etc.). Event perception is user input (i.e. listening).
3. Object creation is user output (writing, i.e. editing in the GUI windows). Object perception
is user input (i.e. reading).
4. Windows S1, S2 etc are Symbolic Level Windows (SLWs) containing symbols which are the names
of lower level windows. Windows A1, A2 etc. are Analog Level Windows (ALWs) containing analog
controls and settings which can be used to synthesise a particular performance.
5. Event analysis is the process of chunking the raw data (see §4). Event synthesis is
the process of creating real events from the data stored in the score. The user could also
create events by performing one of the SLWs live, ignoring any temporal information currently
stored in the score. The user would have to use knowledge of a performance style in order
to do this.
6. The libraries for the Analog Level Windows contain the controls used when creating the
events. These may be either patch controls (for synthesizers), links to other event-oriented
authoring software, or analog to sampler controls or the verbal instructions given to performers.
7. This diagram is the same as the one in the paper
Inherited Problems and a Proposed Solution. For ease of use on screen, it contains
a few more annotations than the one designed for the
poster. [2009: See
below.]
Fig. 1 describes proposals for the global architecture of
an editor for developing music. As in the computer programming environments mentioned above,
the GUI uses windowing techniques to navigate levels of information. Encapsulation ensures
that the inner details of events do not get in the way while one is trying to concentrate
on the relations between the symbols at higher levels.
Notice that none of the symbols in the GUI has a fixed absolute value. Local values are user-editable,
and all default meanings are defined in library functions. The arguments to those functions
are in the same windows (at the same level) as the function names (the symbol names) and form
part of their local context. As with the words of ordinary language, the meanings of music
symbols change according to context.
In this authoring environment, users can access and change both the global definitions of
the symbols they are using (stored in the libraries), and the local values for a particular
performance (stored in the instantiated symbols in the GUI).
It is not difficult to see that current standard music notation does its best, in two dimensions,
to straddle more than one of the symbolic levels shown in Figure 1. (Chords, which are spelled
out with several noteheads can be collapsed to use just one; some note patterns can be more
succinctly expressed using ornament signs; Roman numerals can be used to represent chord functions;
etc., and such strategies are often mixed on the same piece of paper.)
It is to be expected that low-level symbol clustering will continue to function in the same
way on both computer windows and on paper, because both are two dimensional, but the introduction
of accessible, nested windows ought to enable the notation to develop in ways which were otherwise
unthinkable (because uncontrollable).
If this software were being used in conjunction with a synthesizer, the analog controls in
A1 would contain patch information. Deeper levels of those controls are also possible (A2,
A3 etc.) - for example to adjust the sensitivity of one of the controls in A1 (see also §1.4).
Levels above and beyond S2 could be defined for the composition and analysis of very high
level musical events. One could, in principle, do Schenker analysis or describe compositions
in other ways at these levels.
Obviously, a symbol's (e.g. notehead's) meaning is relative to the Symbolic Level Window (SLW)
(S1, S2 etc.) which contains it.
Within an SLW all noteheads have the same parameters. Issuing an edit command for each notehead
in S1 will open a window containing the same set of controls, but with different values for
those controls. Noteheads in other SLWs have different meanings, defined by different sets
of controls (devices).
Users know a great deal about why they perform particular symbols in particular ways in particular
situations in particular styles. Many such insights are however at a high structural level,
and are currently difficult to formalise. It should however be possible to define high level,
long-range criteria using special symbols in high level SLWs. Many low level criteria can,
however, already be fairly easily described (for example, in many styles, the final note of
a slurred phrase tends to have a particular dynamic envelope).
Users of this software would normally load their (spatial and temporal) symbol definitions
from a library, so beginners and many other users would not have to think about programming
these. The definitions should however be accessible and fairly easily programmable using a
visual programming environment.
Visual programming environments (like the one used in IRCAM's OpenMusic
[8] to program "Maquettes") use icons and connecting lines to construct intuitively
usable, user-accessible control structures similar to those which are necessary here.
Programming the spatial behaviour of graphic objects: Any symbol must have a defined shape,
and a way of moving about in space defined with respect to its local context. Standard music
notation programs (e.g. Finale [14] and
Sibelius [15]) routinely keep symbols in linked lists and similar data structures
so as to simplify editing and redrawing. These structures, which are easy for users to understand
and manipulate, describe each symbol's local context.
There are, in music notation, a small number of symbol types, which define the spatial behaviour
of the symbols independently of their shapes. New spatial behaviours are very rare.
Elementary symbol types, which are represented by characters and simple lines, are clustered
to produce compound symbol types such as chords.
New, elementary symbols (e.g. for accents or noteheads), can be created by subclassing an
existing type to inherit a spatial behaviour, and loading a new shape from a font. Notice
that, as with the temporal meanings, the position of a newly instantiated symbol (its local
value) is initially the default position described by the function in the library, but that
this local value must be editable by the user (for example by dragging or nudging with the
arrow keys).
The separation of meaning from spatial behaviour makes it unlikely that radically new kinds
of graphic symbols will be necessary. The graphic libraries can, for the same reason, begin
with a small set of simple, well understood, highly legible symbols, whose evolution can be
expected to slow down fairly quickly.
Programming the temporal meaning of symbols: The temporal meanings of the symbols can be programmed
by linking icons representing the relevant symbols in the local context. As has been done
in OpenMusic, it is possible to predefine a set of abstract controls with which many symbols
can be defined in terms of others (interval width, number of notes, speed, envelope etc.),
and to allow such controls to be dragged and dropped by users to define new symbol control
sets. A trill, for example, might have a control set including initial pitch, speed, interval
width, final turn. The initial pitch would be related to the pitch of the notehead in the
trill's local context. This definition of a trill defines part of the temporal style.
There may be unexpected consequences here: AI research into performance practice and expressivity
in old styles may suggest control structures which could be used to define the temporal meanings
of symbols. The resuscitation of dormant written traditions, and the writing of more expressive
New Music may be the result.
When a user adds a symbol to an SLW, wanting the software to be able to perform it correctly,
the software has to take that symbol's context into account in order to generate the default
values for its known parameters.
The precise value of each individual parameter in each individual symbol is unique, and can
relate both actively and passively to the symbol's local context in ways defined in the library.
Understanding the "correctness" of the default value, at least intuitively, is to recognise
the style defined in the library.
Notice that because the default values of particular instantiations of symbols are discovered
by performing calls to functions, there are no limits to the complexity of the spatial or
temporal style. Libraries may initially use very simple procedures, but become more complicated
later. There is no reason why libraries with a recognisable style, but whose inner workings
can only be understood by a few experts, should not develop. Such libraries could be said
to be "more intelligent".
The means by which users change the settings in the libraries is independent of the complexity
of the library. Sophisticated controls may be used even in simple libraries. For example,
default values could be related to some chance operation - perhaps within some controllable
Gaussian distribution. The parameters of such a Gaussian distribution could be controlled
directly or inferred by the library from a series of demonstrations of correct values by the
user.
Some libraries might learn a user's preferences by observing how that user sets or changes
particular values in the score. This would be especially important if the library is "more
intelligent", and cannot be easily understood or directly edited by the user.
This is the process whereby each event in a series of events (represented in a space-time
diagram) is given a separate symbol (or name), and the event's name is connected to the space-time
information at a lower level. Consider the following space-time diagram:
Fig. 2 Irregular series of events (space-time diagram)
As far as a machine is concerned, this is a single, undifferentiated curve. People however
instinctively break such curves into manageable chunks. Such chunks can be labeled just by
putting a dot on each peak (the dot might be oval, like a notehead). Alternatively, the labels
could be numbers or letters or duration symbols etc. giving more precise information about
the event.
The lengths of the "events" can be classified, irrespective of the existence of a tempo, using
a logarithmically constructed trammel. Using the classic duration symbols means that legibility
can be improved later (horizontal spatial compression, use of beams), and it becomes easy
to develop closely related higher level notations.
Fig. 3 Trammel construction
It would be useful, if standard notation of tempoed music could be a special case here. Standard
notation has evolved to be very legible, so it would be a pity to throw away that advantage.
A histogram can always be constructed from the lengths of the events (for example by first
sorting the lengths into ascending order), so if the diagram contained lengths having proportions
2:1 (as in classical music without triplets), then it would be very easy to construct a trammel
to produce a transcription similar to classical notation. If there are no such proportions
in the original diagram, the user might relate the trammel to the shortest length, or try
to ensure maximum differentiation in the resulting transcription. In any case, the user should
have control over the trammel and the transcription.
Fig. 4 Transcription of duration symbols
Space is being used here to demonstrate the algorithm, but non-dimensional numbers (or bits
in a MIDI stream) would also work. Note that beaming (which has been used freely here) improves
legibility, and has no other function as far as this transcription is concerned.
The use of trammels is generalisable for other parameters. Consider the following:
Fig. 5 Transcription of durations, pitches and dynamics
In addition to using the durations trammel (as previously), this transcription has been made
with a trammel for "dynamics" (the height of each event, see above left) and a trammel for
"pitches" (the colour of the event, see below).
Fig. 6 Trammel for pitches
(The grayscale from black to white is supposed to be continuous.)
Interestingly, the perception of equal steps in both pitch and dynamic is related to logarithmic
steps at the machine level (both the vertical scale and the grayscale in the above diagrams
should be considered logarithmic).
The "pitch" symbols are purely arbitrary here (e.g. alphanumeric symbols could have been used,
and/or the grayscale might have denoted some other parameter - e.g. synthesizer patch). This
has been done to make it clear that there is but a short step from here to something like
standard notation - and again, as much legibility is being preserved as possible...
Once the actual values have been chunked and given a label (symbol), the windows S1 and A1
can be completed in the score (GUI). The A1 windows contain the actual, precise values taken
from the machine level of the original events. No information is lost.
It is quite conceivable, that more complicated symbols could be similarly defined at this
stage - for example staccato dots and accents, classifying particular envelope forms.
All these trammels (or their numeric equivalents) connect symbols with their generalised meaning,
and are stored in the library S1 together with the definitions for how each symbol moves about
in space. (The set of nested libraries is, of course, a software module which can be selected
and changed by users.)
The character set for event lengths (flags etc.) needs to be complemented by a set of symbols
for vacant spaces between events. The traditional symbols for rests would seem to be the logical
choice (legibility preservation).
Many parameters may have symbols for transformation. But this is not true of durations. Durations
already have a time component, so they cant transform - there is no such thing as an event
whose length changes (!)... Transformation symbols for other parameters may include
-
diminuendo and crescendo hairpins for dynamics
-
glissando lines for pitch
-
general purpose arrows etc.
Here is the transcription from §4.2 again:
Fig. 7 A raw transcription
Legibility can be improved, and a higher density of information achieved, by:
-
moving the symbols closer together horizontally
-
omitting repeated dynamics
-
adding slanted beams to create groups
Fig. 8 A transcription with legibility improvements
Group symbols such as beams and omitted dynamics might be defined for the second level symbolic
window (S2).
The "pitch" characters are not necessarily related to pitch - they can also be used for other
parameters. This would reduce the number of symbols whose spatial behaviour has to be defined.
Such symbols have abstract uses - they could, for example be used as general-purpose slider
knobs. Possibly alternative representations for each parameter should be available (e.g. dynamics
with the traditional symbols or as noteheads)
Fig. 9 Symbol overloading
Any symbolic level window can contain many, parallel, multi-parameter tracks. Notice the advantage
of using staves and legerlines rather than putting all the parallel tracks on top of each
other on the same graph. In this example, only the stafflines and ledgerlines have been used
for the eight standard dynamics. MIDI velocities might be chunked with a smaller granularity
- using spaces and/or more ledgerlines. There should be a form of "clef" for each staff, indicating
the range of values notated. The view might have two modes: either space-time or horizontally
compressed for maximum information per window.
The sharing and independent development of style libraries (for both notation and performence)
by different users would allow written traditions of performance style to exist for
written music. (Software is a form of writing.) This would be a radical change from the current
position: a temporal style would be as easy (or as difficult) to learn and develop as a graphic
style. Currently, temporal styles have to be learned during expensive rehearsals with two
or more people. The use of temporal style libraries, stored in software, would enable individuals
to learn a new style in their own time, making group rehearsals more productive.
1. Tappolet, W.: Notenschrift und Musizieren. Robert Lienau, Berlin-Lichterfelde
(1967)
2. Ingram J.: The Notation
of Time. Contact Magazine, London (1985).
3. Ingram J.: Inherited
Problems and a Proposed Solution. Ynez lecture at the University of California,
Santa Barbara (2002).
4. Stockhausen K.: ...wie die Zeit vergeht... in Texte zur Musik Band
1 (DuMont, Cologne 1963); Also in Die Reihe #3 - Eng. trans.by Cardew, C. as ...how time passes...
(1959)
5. Boulez P.: Penser la musique aujourd'hui (Paris 1963); English translation
by Bradshaw, S. and Bennet, R. R. as Boulez on Music Today (London: Faber and Faber,
1971)
6. Karkoshka, E.: Das Schriftbild der Neuen Musik . Moeck Verlag, Celle,
Germany (1966). English translation by Ruth Koenig as Notation in New Music , (Universal
Edition, 1972)
7. Digidesign.: ProTools
8. IRCAM Music Representations group.: OpenMusic
9. IRCAM Music Representations group.: AudioSculpt
10. INA-GRM: Acousmographe & GRM Tools (link expired)
11. CCRMA: SynthBuilder
12. Ingram J.: Perspective, Space
and Time in Music Notation. Proceedings of The 12th Annual Conference on Systems
Research, Informatics, and Cybernetics, Germany (2000)
13. Goodman, N.:The Languages of Art Hackett Publishing Company Inc.
(1976)
14. Coda Music Technology: Finale
15. Sibelius Software Ltd. , England.:
Sibelius
This “preface” was written from 17th-22nd September 2002, on returning from the
conference in Edinburgh for which this paper was written. It does not form part of the paper
itself, but contains some information about the paper's context, and some further thoughts
on the conference, the conference's context and Artificial Intelligence.
This paper is a revision of Music Notation and Agents as Performers
(2001), which was originally written for the
Cast01 symposium at the GMD (now Fraunhofer Institute) in St Augustin, Germany. The
Cast01 organisers rejected the paper without detailed comment. Probably this was partly because
they were more interested in the graphic arts than in music. They were not interested enough
in notation.
Since none of the ideas presented are in print, I was allowed to submit a revised version
to the ICMAI'02 conference, and it
was accepted as a category B paper (a poster presentation). The paper was not presented to the general conference, but I was
able to talk to some interested individuals about the poster.
[2009: the poster was originally published in the ICMAI'02 Additional Proceedings,
but these seem no longer to be accessible. The link from the above ICMAI'02 page is broken, and so is the webmaster's
contact information. I have therefore put the poster (PDF)
here.]
The poster I took to Edinburgh was of course a compressed version of the paper. For illustration
purposes, it additionally contained a section about the current state of a transcription project
I am working on for Curtis Roads (his Sonal Atoms ). There would have been no space
in the paper for this example, and in any case the ICMAI deadline was well before the Sonal
Atoms project began. In some ways the poster is easier to grasp than the paper, because
the ideas did not have to be presented sequentially. I expect to complete the Sonal Atoms
transcription in the near future, and then to write a paper about it.
The Writing of Style Libraries for Music Notation and Performance
takes account both of my own progress since the original paper was written, and the ICMAI'02
reviewers' comments (I did my best to add references, and to remove any other misunderstandings).
In particular, the revision avoids the word "agent" so as to circumvent an unproductive semantic
debate. By “agent” I mean a piece of software which can be (and needs to be) trained
over a period of time - in this case to reproduce the fine details of a particular notation
and/or performance style. The level of detail achieved is a function of time, and can go beyond
some (all?) users' expertees or comprehension. Something similar happened with chess-playing
programs (though probably not all of these were trained as "agents" in the sense I am using
here). I still think that the libraries I am proposing can develop in this way into
increasingly “intelligent” music copyists (transcribers) and performers.
Interestingly, the ICMAI reviewers disagreed as to the merits of this paper. At least one
of them thought I was a student trying to bite off more than he could chew. This was, I think,
because they were denied a frame of reference for the words they were reading. I am actually
rather an unlikely person from their point of view, so it was difficult for them to make sense
of what they were reading. They had no way of knowing how old I am, or indeed anything else
about my background, so they didnt see that these are in fact simple, elegant answers to a
very large complex of problems about which I have in fact had time to think...
Their difficulties were made worse because I did not originally provide enough references
- incidentally reinforcing their conviction that I must be a beginner. Practically none of
what I have written can be found in the printed academic literature. (It can all be found
here at this web site.) While I hope I have learned my lesson about the importance of providing
references, the problem seems for the moment to be self perpetuating. This paper has also
not made its way into the printed conference proceedings. It is not the first time that a
paper of mine has failed to find its way into academic print. Something very similar happened
to The Notation of Time
in the early 1980s.
Notice that abstract music notation is often treated by academics as a subject which is too
big to tackle. It is indeed a subject requiring a broad interdisciplinary experience which
cannot be acquired during a few years at a university. Even after leaving university, most
careers tend to lead people away from dealing with symbols per se, so that they forget about
their graphic aspects. In my case, this has not happened.
At the beginning of ICMAI'02, it was announced that the authors of the two best category B
papers would be asked to present their work to the conference. So I was hoping that I had
done enough to clarify the paper's context, and that it might after all be possible to transcend
the difficulties outlined above. Unfortunately, this did not happen. For some reason, none
of the category B papers were presented. The best I could do was to go through the poster
with some interested individuals. This is not the easiest way to spread ideas - one gets very
tired repeating the same things over and over again within a short space of time - and when
those things are multi-valent, one forgets how the particular conversation has developed,
and is afraid of repeating oneself, so one leaves important things out...
Academics work in a community having a rather special sociological structure. It is therefore
easy for them to ignore or misjudge the effect which this structure has on the way their techniques
and theories develop. Also, it is easy for them to ignore or misjudge the ability of their
work to survive when transplanted into completely different sociological frameworks (for example,
that of artists or professional musicians). Many of the papers delivered at the conference
suffered from these kinds of problem. The result was that some of the best speakers were asking
themselves why they were doing what they were doing.
Interestingly, the justification from pure curiosity can be applied to both the "Pure
Sciences" and the "Arts". It is not an argument for the establishment of a particular "purely
scientific" or "purely artistic" university department. For reasons I give below, I think
that AI studies must be intrinsically interdisciplinary, and that any attempt by the academic
community to constrain them, must be related to the sociological environments inherent to
university departments.
In my own sociological environment, there are very practical reasons for reaping the benefits
of AI research. My answer to the "Why?" question is that I would like to have an agent which
I could train to help me with my job. Having a personal "agent" would help me survive.
My agent-student ought to save me time on routine day-to-day work so that I would have more
time for getting on with other, even more complex and enjoyable tasks (like listening and
performing, thinking, reading and writing). An agent could be reproduced mechanically (its
only a piece of software), so the amount of work it could do simultaneously would be unlimited
(which would be good for my bank balance).... And it might survive a little longer than me
too...
Another, less egocentric, reason is that I would like to see (and hear) traditions of written
music developing again. Commonly available, trainable, developing agents would be such
traditions.
I think that intelligence is about dealing with complexity.
AI investigates the strategies we use for simplifying that complexity (hierarchic symbol organisation,
reasoning with insufficient data etc.). Successful AI techniques simplify the complexity out
of existence, so the frame of reference changes. When something has been simplified enough
for it to be programmed into a computer, we tend to think that the word "intelligent" is no
longer appropriate. This means that AI has to be intrinsically interdisciplinary. When the
goalposts move, they take no account of the boundaries of academic disciplines. One has to
follow them to the next tractable problem, wherever that might be...
So the best workers in AI maintain direct contacts with complex realities outside their paid
work (for example by being amateur musicians). Maybe the maintenance of extra-disciplinary
contacts (especially to the "Arts") should even be a condition of employment in an AI job.
But how could one define such "external contacts" in a legal contract? Maybe AI researchers
should not be allowed to "work-to-rule"... There has to be a certain anarchy...
At any rate, AI researchers who combine reductionist ("scientific") and non-reductionist ("artistic")
strategies for dealing with complexity are probably more likely to survive . They are
less likely to run out of problems to solve. Maybe the institutional stresses are already
leading to a solution of the “Two
Cultures” problem.