Tick-based Timing

1: Introduction

This proposal was first made in an issue in the MNX GitHub Repository. It is simple, but has interesting consequences that I think need to be properly thought through. Unfortunately, the original issue has been closed — probably because it needs to be resolved in a context that is wider than the one in which it was originally raised. This proposal applies to all music notations that contain event symbols, not just Common Western Music Notation as it existed in 1900.

1.1: original context and revisions log

Original context: See also the minutes of the fortnightly MNX co-chair meetings (January-April 2021).
MNX is intended to be a set of next-generation, web-friendly music notation encodings, related via common elements in their schemas. The first format being developed is MNXcommon1900, which is intended to be the successor to MusicXML. It does not have to be backwardly compatible with MusicXML, so the co-chair wants to provide documentation comparing the different ways in which MNXcommon1900 and MusicXML encode a number of simple examples. Unfortunately, the co-chair first needs to revise the MusicXML documentation in order to do that, so work on MNXcommon1900 has temporarily stopped. The intention is to start work on it again as soon as the MusicXML documentation revision is complete. After 5+ years of debate, MNXcommon1900 is actually in a fairly advanced state.
I have two MNX-related GitHub repositories (the best way to really understand software is to write it):

MNXtoSVG: A (C#) desktop application that converts MNX files to SVG. This application is a test-bed for MNX's data structures, and successfully converts the first completed MusicXML-MNX comparison examples to (graphics only) SVG. When/if MNX includes temporal info, it will do that too (using a special namespace in the SVG).
A fork of the MNX repository: This contains (among other things) the beginnings of a draft schema for MNXcommon1900. The intention is to plunder that schema for other schemas... I'm looking for things that all event-based music notations have in common, so that software libraries can be used efficiently across all such notations. That's important if we want to develop standards that are consistent all over the web.

14th April 2021: first publication. Opened a related thread in the public MEI-L mailing list.

28th April 2021 revision:

Some discussion followed in the MEI-L mailing list, but I need to understand MEI better before continuing there.
Inserted this blue note (§1.1)
Realised, during the MEI-L discussion that the <millisecondsPerTick> element can be omitted if playback of the file is not required. Changed §3.3 accordingly.

1.2: general considerations

The main purpose of written music symbols is to be an aide-memoire used in human-human communication. Its encoding should therefore reflect the limits of human perception. Human brains are information processors, not mechanical time-pieces, so they need time to do the processing. Infinitely divisible, smooth time is just a persistent illusion. The proposal therefore reflects the (imperceptibly) granular structure of humanly perceived time.
The complex meanings of music symbols, including style, performance practice etc., are learned in music lessons and rehearsals. Real time performance practice is a matter of tradition, taste and artistry, and is dependent on both the score (composer) being interpreted and the interpreter. Broad, locale- and epoch-related conventions apply, but the symbols never have a universally applicable, absolute (mechanical) temporal meaning.
The Common Western symbols have evolved over several centuries to become extremely efficient at encapsulating information. They are extremely legible and are familiar all over the world, so there are good reasons for continuing to use them, possibly with slightly refined graphical conventions, but allowing backward compatibility. This does not mean that the world’s other music notations can be neglected.
(Temporal conventions continue to be a matter of learned performance practice, artistry etc., regardless of how the graphics are defined.)

Mechanical audio recordings exist since the end of the 19th century, so it obviously makes sense to augment the symbols with an interface for human-machine and machine-machine communication.
Including exact, machine readable, durations for events would:

allow composers to add real time information to their scores, and performers to listen to demonstration timings before rehearsing with others. This can shorten expensive rehearsal time, and promote the development of performance styles for written music. (Note that Jazz styles could not have developed without the existence of recordings.)
enable cursor synchronization in educational applications.
enable high-level encapsulation of temporal data in the GUIs of Digital Audio Workstations.
allow the development of appropriate notation conventions for the automatic transcription of audio files.
enable comparative analysis of different interpretations of the same score.

But, for these use-cases to be viable:

It must be possible to define machine-readable durations for individual events.
the time resolution (tick-size) has to be below the threshold of human perception.

2: First principles

The following definitions define events and ticks as they relate to human-machine communication. They are notation agnostic.

2.1: event and tick definitions

An event is a temporal object symbolised, in music notation, by a duration symbol (a spatial object).
In time, each event has an onset-time and a duration.
In space, each duration symbol has position, width and height.
In music notation, an event's onset-time is considered to be instantaneous. We are not concerned with the ramp in the envelope that can be observed using special apparatus.
An event may have no audible content: A rest is a duration symbol denoting a silent event.
An event may contain a single level of nested onset-times and durations (that have no duration symbols of their own). In this case, the event’s duration symbol can be decorated in some way (e.g. with an ornament such as ornaments

etc.). Below that single level, onset-times are no longer perceived. Timbre is perceived instead.

A tick is an imperceptible quantum of time.
Each tick has an instantaneous onset-time (tick.time) and an abstract duration (called a tick).

When using tick-based timing, all events begin and end on a tick grid. So:
Each event begins at an integral tick.time, and has a duration that is an integral number of ticks.

In all (ancient and modern) scores containing western polyphonic music (See e.g. Apel):

The duration symbols for synchronous events are vertically aligned so as to be consistent

with ordinary, horizontally written text. (Asian polyphonic music would be rotated by 90°.)

(To be really pedantic: “synchronous events” only have to be perceived to be “synchronous”, and “aligned duration symbols” only have to be perceived to be “aligned”. Inaccuracies that might be revealed using some measuring apparatus don’t matter. That's important in human performances and when writing scores by hand. In machine performances and score-writing by computer, its easiest just to let the machines take over...)

This means that:
Ticks carry both temporal and spatial information.
In particular, synchronous events at the beginning of a bar have the same tick.time, so:

The (abstract) tick durations of the events in parallel voices in a bar, add up to the same value.

In other words:
Bars “add up” in (abstract) ticks.
The same is true for parallel voices in systems (that are as wide as the page allows) even when there are no barlines, so:
Systems also “add up” in (abstract) ticks.

The “absolute” duration of a single tick will generally end up being shorter than ca. 1 millisecond, but note carefully that durations measured in ticks remain abstract until related to some “absolute” duration. This can be done, for example, by providing a millisecondsPerTick value.
Consider the following dimensional analysis:
If [tick] represents the dimension(s) of a tick, and [time] is the dimension of seconds, milliseconds etc., then a real time value (having dimension [time]) is created when multiplying ([tick] x ([time]/[tick])).
It can be seen that, whatever the dimension(s) of [tick] might be, they cancel out. Its therefore possible to treat ticks as dimensionless numbers, without ruling out the possibility that more inscrutable dimensions are involved.
Its important not to assume anything about ticks since we have no real knowledge of what goes on below the level of human consciousness. Time may just be the brain's way of coping with a more complex underlying reality.

2.2: “absolute” durations

The term “absolute” dates back at least to Newton, though it has not always meant the same thing. See, for example, the Stanford Encyclopedia of Philosophy. Note that Newton's “absolute time” is a kind of temporal ether, and that he was thinking in terms of flat, Euclidean, space and time. Non-Euclidean geometries were first developed in mathematics during the 19th century, and became important in physics at the beginning of the 20th...

Each event can be given an “absolute” duration (a duration in seconds, milliseconds etc.) by multiplying its abstract tick duration by a value that uses conventional temporal units.
Note that seconds, milliseconds etc. are not really “absolute”. They:

depend on the use of external apparatus (clocks, metronomes, computers)
are simply universally agreed units that enable inter-human, human-apparatus and inter-apparatus, communication.

The ratio between tick duration and seconds can be defined using a value such as ticks-per-second, milliseconds-per-tick etc. Note that

The ratio is not an integer, and that it can only change at a particular tick-position.
The accuracy of the resulting “absolute” duration depends on the apparatus being used.

On the web: In Javascript, the function Number.toFixed(decimalPlaces) returns a number with a maximum of 20 decimal places, giving a theoretical accuracy of about (1 x 10^-20) seconds, milliseconds etc. Conversion to MIDI 1.0's metric time ticks or MIDI 2.0 timestamps, that are both expressed in microseconds (1 x 10^-6 seconds) should therefore be no problem.
A score that takes 100 hours to play would have (100 x 60 x 60 x 1000) = (36 x (10^7)) millisecond ticks.

On the web: That's well within Javascript's MAX_SAFE_INTEGER limit (which is greater than 9 x (10^15)).

The “absolute” duration of an event can be changed using any of the following methods:

Change the ratio between tick duration and (milli)seconds.
Even if the ratio changes gradually over time (as in an accel. or rit.), this would not change the tick-positions of parallel events, so would not affect their relative alignments.
Change the event's tick duration.
Note that the following operations are independent of the ratio between ticks and absolute time, so real-time performance practice is unaffected.
- Rubato in a single voice:
  If the tick durations of other events in the same voice are changed accordingly, an event's tick duration can be changed without changing the voice's total tick duration. This operation can change the alignment of events in one voice with respect to events in other voices in the same bar.
- Fermatas, rubato in parallel voices etc. (See also grace notes below):
  If the same number of ticks is added, at the same tick.time, to all the voices in a bar, the bar's tick duration will change, but the relative alignments of the contained events will not.
This means that while the standard duration symbols can have generally applicable default tick durations, the value can change for individual events. Which means, in turn, that
- Different bars need not have the same tick duration, even when their duration symbols "add up" classically.
- A particular bar can contain duration symbols that do not "add up" classically.
  (See §4.1: non-standard use of the Common Western event symbols below.)

Tempo (crotchets per minute) is being replaced here by imperceptible milliseconds-per-tick values. Controlled changes to the absolute durations of events can therefore be made not only by adjusting their individual tick durations but also by warping the tick-grid.
Such adjustments are not possible using the late 19th century (Newtonian-Euclidean) time paradigm that associates fixed (mechanical) tick durations to the Common Western duration symbols.
Another way to look at this is to consider it as being a way to deal quantitavely with (the vaguely defined) 19th century “expressivity”.

3: Common Western Music Notation (as used at the beginning of the 20th century)

This notation is still very widely used, so needs special attention here. It uses both grace-notes and tuplets.

3.1 grace notes

If a bar's tick duration is to remain unchanged, then the tick durations of events notated as grace-notes have to be subtracted from the tick duration(s) of some other event(s) in the same voice in that bar. Alternatively, the bar's tick duration can be increased, and the tick durations of the grace-notes inserted in the tick durations of all the parallel events. Contrary to common practice when making bars “add up”, grace notes are not “outside time”. If applications used ticks to determine vertical alignments, inter-application layout would become both more meaningful and more reliable.

3.2 tuplets

A default number of ticks per duration symbol can be initialised at the start of a score. The value can be adjusted later, as the score progresses.
In ordinary CWMN, each basic (non-tuplet) duration class has twice the number of ticks as the next smaller duration class. Dotted notes also have default tick durations that are calculated in the usual way.

For example: If the default tick duration of a crotchet is 1000 ticks, then the default numbers of ticks for some of the other basic (non-tuplet) duration classes would be:

duration class symbol	default tick duration
semiquaver	250
quaver	500
crotchet	1000
minim	2000
semibreve	4000
dotted semiquaver	375
dotted crotchet	1500
double dotted crotchet	1750
triple dotted crotchet	1875
etc.

Calculating event durations in this way ignores performance practice. For example, it would be more accurate to define a duration template for the default tick durations in each bar of a Viennese waltz. Notes inégales could also be defined differently.

The default tick durations actually depend on:

the shortest basic duration class symbol
the required tuplets (triplets, quintuplets, septuplets etc.)
the approximate absolute duration of a crotchet (ticks must be imperceptibly short)

The actual (default) tick durations for each basic duration class can be found by doubling the value for the smallest basic duration class, so:
First locate the smallest basic (not dotted, non-tuplet) duration class dcMin. (This could, for example, have three flags.) and let ticksMin be the minimum tick duration of a dcMin.
Note that ticksMin is indivisible by 2 -- by definition, there is no basic duration class smaller than a dcMin.

Let dcMin+ be the next largest basic duration class (if dcMin has three flags, dcMin+ has two).
Then dcMin+ has a tick duration of (2 * ticksMin) ticks.
If a dcMin+ is to be evenly divisible by 3 (triplets), then (2 * ticksMin) must be divisible by 3, and ticksMin must be 3 (= the smallest number divisible by 3).

In general, for non-nested tuplets, the value of ticksMin should be divisible by all the required tuplet divisors.
For example, if triplets, quintuplets and septuplets are needed, then ticksMin should be (3 x 5 x 7) = 105.
Similarly, for nested tuplets:
Let dcMin++ be the duration class two levels above dcMin. By definition, this duration class has a minimum tick duration of (ticksMin x 2 x 2). If dividing (ticksMin x 2 x 2) by 3 results in a tick duration that is also divisible by 3, then ticksMin must be divisible by (3 x 3) = 9.

If dcMin is a demisemiquaver having a tick duration of 105 (allowing simple triplets, quintuplets and septuplets inside semiquavers), the tick duration of a crotchet would be (105 x (2 x 2 x 2)) = 840. That would mean, if the absolute duration of a crotchet was ca. 1 second, that the duration of a single tick would be longer than ca. 1 millisecond, which is undesirable.
To avoid this problem, the default tick durations of all the basic duration classes can simply be multiplied by 2, resulting in in the following table:

duration class symbol	default tick duration
demisemiquaver	210
semiquaver	420
quaver	840
crotchet	1680
minim	3360
semibreve	6720
dotted semiquaver	630
dotted crotchet	2520
double dotted crotchet	2940
triple dotted crotchet	3150
etc.

It would be quite feasible to provide general users with a lookup-table defining the default tick durations required in common situations.

3.3: a proposal for MNX (common1900)

This proposal was first made in the (currently closed) GitHub issue. It is very simple:

The duration of events in MNX (common1900) should be measured in ticks.
The millisecond duration of individual ticks should be set globally.
All individual ticks should have durations that are so short that they are imperceptible.

More precisely, in the code sketch below:

The <tickResolution> ticksPerUnit attribute is an integer.
The tick-durations of event symbols define their relative alignments in space.
The <millisecondsPerTick> element can be omitted if playback of the file is not required.
Its value attribute is floating point. Milliseconds are used here because MNX is intended to be compatible with W3C web standards. The Web MIDI API uses floating-point millisecond timestamps.
The globally defined <tickResolution> and <millisecondsPerTick>
- together define the default, absolute duration of a duration class symbol.
  (1 second per crotchet in the example below).
- can be changed, independently of each other, at any tick position in the score.
  (Changing either value redefines the global tick-grid, so immediately affects all current <event>s.)
Derived basic duration classes have the usual, corresponding default tick durations.
In the example below:
- a quaver ("/8") has 1000/2 (=500) ticks
- a dotted crotchet ("/4d") has 1000 + 500 (=1500) ticks
- a minim ("/2") has 1000 x 2 (=2000) ticks
- a semibreve ("/1") has 1000 x 4 (=4000) ticks
- etc.
<event> duration attributes can override an event's default duration.

In addition to abrupt speed changes, gradual speed changes (accel. or rit.) can be defined at any time during the score using a <speedChange> element of some kind: (see this comment)

4: Other Notations

4.1: non-standard use of the Common Western event symbols

While CWMN1900 is the most widely used, it is actually just one of a wider class of notations that use the same symbols. In the following examples, the alignments (intended synchronizaton) can be achieved by setting the appropriate event tick-durations.

Note (again) that changing the events’ tick-durations does not affect the (volatile) ratio between ticks and “absolute” time. Real-time performance practice is unaffected.

4.1.1: Baroque

See The Notation of Time, MNX Issue #74 (closed), MNX Issue #79 (closed), MNX Issue #129 (open)

Couperin: Unmeasured Prelude in G minor:
(Screenshot of Bauyn Ms. taken from John Moraitis’ performance)

Bach: Die Kunst der Fuge:
(Note that a closer approximation to the probable durations of the events inside the duplets could also be achieved by re-allocating their tick durations without changing the “micro-tempo”.)
Bach: Die Kunst der Fugue

Bach: Sarabande:

4.1.2: Romantic

There are many examples in Julian Hook: How to Perform Impossible Rhythms.
See also: MNX Issue #79 (closed) and MNX Issue #129 (open).

Schumann, Bunte Blätter, Op. 99, No. 2
See Hook: (Example 18)

Brahms:
See MNX Issue #79.
Brahms

4.1.3: Ingram: Study 2 (2010-2012)

In this score, the ticks “add up” in each bar, and each event has a tick duration that determines its symbol’s duration class. Duration classes have the tick-duration ranges listed below. This allows the symbols to be distributed in the usual way across systems, and takes advantage of the Common Western duration symbols’ inherent legibility.

minimum ticks	duration class	maximum ticks
101	demisemiquaver	200
201	semiquaver	400
401	quaver	800
801	crotchet	1600

Ingram: Study 2, bars 36-37 (extract)
Study2 extract

(In the score's code, the event durations are called “msDuration”, but this is really a misnomer for “ticks”. The performed durations actually depend on the position of the speed slider in the application.)

See also: About Study 2.
This score can be performed on-line using my Assistant Performer web application.

4.2: notations that don’t use Common Western event symbols

The tick.time at which an event symbol is performed is unrelated to its internal graphical structure.

In all monophonic and polyphonic music notations, event symbols are laid out using their external boundaries, in straight line segments whose length is limited by the width (or height) of the page. The temporal sequence corresponds to the line-order and the direction in which the lines are supposed to be read: either horizontal (left-to-right or right-to-left) or vertical (top-to-bottom or bottom-to-top). Polyphonic notations have “systems” containing parallel lines of event symbols that are aligned so that they have a single, common temporal sequence.

The internal graphical structure of an event symbol can be arbitrarily complex, so it makes no difference whether the event symbol is a Common Western chord symbol, ordinary text, a graphic designed for some special purpose, or some kind of Western or Asian tablature.

4.2.1: Shakuhachi notation

Wikipedia contans the following example of a notation for shakuhachi:

Image by Akihito Fuji - https://www.flickr.com/photos/afujii/4742102949/,
CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=16314677

There is an interesting explanation of how to play such scores here on YouTube.

5: Digression: machine-machine communication

Ticks that enable machine-machine communication have a fixed size that allows the machines to be built to a common standard by which they can agree on what the ticks mean. This is in contrast to the imperceptible ticks used in human communication, whose size depends on the volatile processing of informaton in human memory.
However, MIDI is widely used by computers and computer music applications, so a review here would seem to be in order.

5.1: the use of ticks in Standard MIDI Files (MIDI 1.0)

The screenshots below are of the Standard MIDI-File Format Spec. 1.1, updated. This document is nearly identical to the one that can be downloaded from MIDI.org, but is occasionally a bit clearer.

5.1.1: MIDI 1.0’s basic temporal units

Unfortunately, the MIDI specifications are a bit short on basic definitions, so their meaning has to be gleaned from various parts of the document.
First, note that MIDI never uses spatial units, so it can never be used to describe graphics.
In music notation, the term “quarter-note” is used for a graphical object, so using it to describe a "beat" or "metronome click" is a bit confusing. MIDI 1.0 uses it as a basic unit of time, so I'll continue to use it in this section while using the British terms “crotchet”, “quaver” etc. elsewhere to mean the graphic objects.

The §3.1 Meta-Event Definitions paragraph about the Time-signature Meta-Event says:

For present purposes, this boils down to the following definitions:
a MIDI 1.0 “quarter-note” is a temporal unit that always has 24 MIDI 1.0 clocks
and
a MIDI 1.0 “metronome click” is a temporal unit that can contain any number of MIDI 1.0 clocks

The duration of a MIDI 1.0 clock is set by setting the duration of a MIDI 1.0 “quarter-note” using the Set Tempo
Meta-Event (also in §3.1 Meta-Event Definitions):

This paragraph is a bit confused: Good long-term synchronisation is allowed by using 24-bits to define the microsecond duration of a “quarter-note”, not by “Representing tempos as time per beat instead of beat[s] per time”. (Note the typo in the blue box.)
In spite of having this "Set Tempo" Meta-Event, MIDI actually does nothing of the sort. Tempi are never defined, and are even unnecessary in the standard. The Web MIDI API (2015) no longer mentions them at all, but simply gives each MIDI message a timestamp (a floating-point millisecond value).

The tttttt parameter is a 24-bit value, whose maximum value is ((2^24) - 1) = 16,777,215 microseconds, so
the maximum duration of a MIDI 1.0 “quarter-note” is 16.777215 seconds.

5.1.2: MIDI 1.0 ticks and divisions

There is a two-byte word in the MIDI header chunk (§2.1 Header Chunks) called division:

First, this is saying that:
A tick is a delta-time: The smallest time increment that can be used in a particular time format.
MIDI has two time formats: metrical time and time-code-based time, so MIDI defines two different tick types.
In metrical time, the divisions value has 15 bits, representing “ticks per quarter-note”, and:
A tick is the smallest performable fraction of a “quarter-note”,
and
divisions has a maximum value of (2^15) - 1 = 32767.
Note that in metrical time, durations have microsecond — i.e. (1 x 10^-6 second) accuracy.

In time-code-based time,
A tick is the smallest performable fraction of an SMPTE frame (unrelated to “quarter-note”s).

In metrical time, if qSec is the duration of a “quarter-note” in seconds and, as described above, smooth temporal precision requires ticks to be shorter than approximately 1 millisecond, then
divisions should have a minimum value of qSec x 1000.
and (in MIDI)
qSec x 1000 < divisions <= 32767.

5.1.3: comment

The use of “quarter-note” durations and divisions to define metrical-time ticks in the MIDI specification (1983) was an attempt to integrate CWMN's tempi and tuplets, but with a rather naïve view of CWMN that ignores performance practice.
Trying to ignore performance practice was a common mistake in the 1980s. It was first made by neo-classical composers in the first half of the 20th century, but was perpetuated in the 1950s by the Avant-Garde — who tried to treat CWMN as being "precise" in contrast to graphical notations that were considered to be relativey “free”. (Stockhausen wants the tempi written in his CWMN scores to be performed as accurately as possible.)
MIDI's metrical-time ticks are inappropriate (become impractical), either if there is no strict, mechanical, humanly perceptible tempo, or if CWMN’s duration class symbols (crotchets, quavers etc.) are being used in a non-standard way — see above. When that is the case, ticks have to be defined in some other way.

5.2: MIDI 2.0 ticks and timestamps

MIDI 2.0 introduces Jitter Reduction Timestamps for synchronizing messages across MIDI devices. These timestamps can be prepended to MIDI messages, and use a new MIDI Clock whose resolution (tick size) is 32 microseconds (=0.032ms). The ticks are of constant, absolute size. This means that the timestamps can also be related to absolute timings being used by non-MIDI devices (in DAWs etc.).
Note that the number of ticks (=duration) between each JR Timestamp is not controlled by MIDI, but by (the users of) the soft- or hardware that sends the timestamped MIDI messages.

See the MIDI 2.0 session at the January 2021 NAMM Conference.
(Florian Bomers’ description of JR Timestamps begins at 11' 43".)