This spec defines the 8-bit binary data stream used in the file. The data can be stored in a binary file, nibbleized, 7-bit-ized for efficient MIDI transmission, converted to Hex ASCII, or translated symbolically to a printable text file. This spec addresses what's in the 8-bit stream.
This proposal defines two types of chunks: a header chunk and a track chunk. A header chunk provides a minimal amount of information pertaining to the entire MIDI file. A track chunk contains a sequential stream of MIDI data which may contain information for up to 16 MIDI channels. The concepts of multiple tracks, multiple MIDI outputs, patterns, sequences, and songs may all be implemented using several track chunks.
A MIDI file always starts with a header chunk, and is followed by one or more track chunks.
Some numbers in MTrk chunks are represented in a form called a variable- length quantity. These numbers are represented 7 bits per byte, most significant bits first. All bytes except the last have bit 7 set, and the last byte has bit 7 clear. If the number is between 0 and 127, it is thus represented exactly as one byte.
Here are some examples of numbers represented as variable-length quantities:
Number (hex) Representation (hex) 00000000 00 00000040 40 0000007F 7F 00000080 81 00 00002000 C0 00 00003FFF FF 7F 00004000 81 80 00 00100000 C0 80 00 001FFFFF FF FF 7F 00200000 81 80 80 00 08000000 C0 80 80 00 0FFFFFFF FF FF FF 7FThe largest number which is allowed is 0FFFFFFF so that the variable- length representation must fit in 32 bits in a routine to write variable-length numbers. Theoretically, larger numbers are possible, but 2 x 108 96ths of a beat at a fast tempo of 500 beats per minute is four days, long enough for any delta-time!
Here is the syntax of an MTrk chunk:
< meta-event> specifies non-MIDI information useful to this format or to sequencers, with this syntax:
< sysex event> is used to specify a MIDI system exclusive message, or as an "escape" to specify any arbitrary bytes to be transmitted. Unfortunately, some synthesizer manufacturers specify that their system exclusive messages are to be transmitted as little packets. Each packet is only part of an entire syntactical system exclusive message, but the times they are transmitted at are important. Examples of this are the bytes sent in a CZ patch dump, or the FB-01's "system exclusive mode" in which microtonal data can be transmitted. To be able to handle situations like these, two forms of < sysex event> are provided:
(New to 0.06) A syntactic system exclusive message must always end with an F7, even if the real-life device didn't send one, so that you know when you've reached the end of an entire sysex message without looking ahead to the next event in the MIDI file. This principle is repeated and illustrated in the paragraphs below.
The vast majority of system exclusive messages will just use the F0 format. For instance, the transmitted message F0 43 12 00 07 F7 would be stored in a MIDI file as F0 05 43 12 00 07 F7. As mentioned above, it is required to include the F7 at the end so that the reader of the MIDI file knows that it has read the entire message.
For special situations when a single system exclusive message is split up, with parts of it being transmitted at different times, such as in a Casio CZ patch transfer, or the FB-01's "system exclusive mode", the F7 form of sysex event is used for each packet except the first. None of the packets would end with an F7 except the last one, which must end with an F7. There also must not be any transmittable MIDI events in- between the packets of a multi-packet system exclusive message. Here is an example: suppose the bytes F0 43 12 00 were to be sent, followed by a 200-tick delay, followed by the bytes 43 12 00 43 12 00, followed by a 100-tick delay, followed by the bytes 43 12 00 F7, this would be in the MIDI File:
F0 03 43 12 00 81 48 200-tick delta-time F7 06 43 12 00 43 12 00 64 100-tick delta-time F7 04 43 12 00 F7The F7 event may also be used as an "escape" to transmit any bytes whatsoever, including real-time bytes, song pointer, or MIDI Time Code, which are not permitted normally in this specification. No effort should be made to interpret the bytes used in this way. Since a system exclusive message is not being transmitted, it is not necessary or appropriate to end the F7 event with an F7 in this case.
0 the file contains a single multi-channel track 1 the file contains one or more simultaneous tracks (or MIDIoutputs) of a sequence 2 the file contains one or more sequentially independent single-track patternsThe next word, ntrks, is the number of track chunks in the file. The third word, division, is the division of a quarter-note represented by the delta-times in the file. (If division is negative, it represents the division of a second represented by the delta-times in the file, so that the track can represent events occurring in actual time instead of metrical time. It is represented in the following way: the upper byte is one of the four values -24, -25, -29, or -30, corresponding to the four standard SMPTE and MIDI time code formats, and represents the number of frames per second. The second byte (stored positive) is the resolution within a frame: typical values may be 4 (MIDI time code resolution), 8, 10, 80 (bit resolution), or 100. This system allows exact specification of time-code-based tracks, but also allows millisecond-based tracks by specifying 25 frames/sec and a resolution of 40 units per frame.)
Format 0, that is, one multi-channel track, is the most interchangeable representation of data. One application of MIDI files is a simple single-track player in a program which needs to make synthesizers make sounds, but which is primarily concerned with something else such as mixers or sound effect boxes. It is very desirable to be able to produce such a format, even if your program is track-based, in order to work with these simple programs. On the other hand, perhaps someone will write a format conversion from format 1 to format 0 which might be so easy to use in some setting that it would save you the trouble of putting it into your program.
Programs which support several simultaneous tracks should be able to save and read data in format 1, a vertically one-dimensional form, that is, as a collection of tracks. Programs which support several independent patterns should be able to save and read data in format 2, a horizontally one-dimensional form. Providing these minimum capabilities will ensure maximum interchangeability.
MIDI files can express tempo and time signature, and they have been chosen to do so for transferring tempo maps from one device to another. For a format 0 file, the tempo will be scattered through the track and the tempo map reader should ignore the intervening events; for a format 1 file, the tempo map must (starting in 0.04) be stored as the first track. It is polite to a tempo map reader to offer your user the ability to make a format 0 file with just the tempo, unless you can use format 1.
All MIDI files should specify tempo and time signature. If they don't, the time signature is assumed to be 4/4, and the tempo 120 beats per minute. In format 0, these meta-events should occur at least at the beginning of the single multi-channel track. In format 1, these meta- events should be contained in the first track. In format 2, each of the temporally independent patterns should contain at least initial time signature and tempo information.
We may decide to define other format IDs to support other structures. A program reading an unfamiliar format ID should return an error to the user rather than trying to read further.
FF 00 02 ssss Sequence Number
This optional event, which must occur at the beginning of a track,
before any nonzero delta-times, and before any transmittable MIDI
events, specifies the number of a sequence. The number in this track
corresponds to the sequence number in the new Cue message discussed at
the summer 1987 MMA meeting. In a format 2 MIDI file, it is used to
identify each "pattern" so that a "song" sequence using the Cue message
to refer to the patterns. If the ID numbers are omitted, the sequences'
locations in order in the file are used as defaults. In a format 0 or 1
MIDI file, which only contain one sequence, this number should be
contained in the first (or only) track. If transfer of several
multitrack sequences is required, this must be done as a group of format
1 files, each with a different sequence number.
FF 01 len text Text Event
Any amount of text describing anything. It is a good idea to put a text
event right at the beginning of a track, with the name of the track, a
description of its intended orchestration, and any other information
which the user wants to put there. Text events may also occur at other
times in a track, to be used as lyrics, or descriptions of cue points.
The text in this event should be printable ASCII characters for maximum
interchange. However, other character codes using the high-order bit
may be used for interchange of files between different programs on the
same computer which supports an extended character set. Programs on a
computer which does not support non-ASCII characters should ignore those
characters.
Meta event types 01 through 0F are reserved for various types of text events, each of which meets the specification of text events(above) but is used for a different purpose:
FF 02 len text Copyright Notice
Contains a copyright notice as printable ASCII text. The notice should
contain the characters (C), the year of the copyright, and the owner of
the copyright. If several pieces of music are in the same MIDI file,
all of the copyright notices should be placed together in this event so
that it will be at the beginning of the file. This event should be the
first event in the first track chunk, at time 0.
FF 03 len text Sequence/Track Name
If in a format 0 track, or the first track in a format 1 file, the name
of the sequence. Otherwise, the name of the track.
FF 04 len text Instrument Name
A description of the type of instrumentation to be used in that track.
May be used with the MIDI Prefix meta-event to specify which MIDI
channel the description applies to, or the channel may be specified as
text in the event itself.
FF 05 len text Lyric
A lyric to be sung. Generally, each syllable will be a separate lyric
event which begins at the event's time.
FF 06 len text Marker
Normally in a format 0 track, or the first track in a format 1 file.
The name of that point in the sequence, such as a rehearsal letter or
section name ("First Verse", etc.).
FF 07 len text Cue Point
A description of something happening on a film or video screen or stage
at that point in the musical score ("Car crashes into house", "curtain
opens", "she slaps his face", etc.)
FF 2F 00 End of Track
This event is not optional. It is included so that an exact ending
point may be specified for the track, so that it has an exact length,
which is necessary for tracks which are looped or concatenated.
FF 51 03 tttttt Set Tempo, in microseconds per MIDI quarter-note
This event indicates a tempo change. Another way of putting
"microseconds per quarter-note" is "24ths of a microsecond per MIDI
clock". Representing tempos as time per beat instead of beat per time
allows absolutely exact long-term synchronization with a time-based sync
protocol such as SMPTE time code or MIDI time code. This amount of
accuracy provided by this tempo resolution allows a four-minute piece at
120 beats per minute to be accurate within 500 usec at the end of the
piece. Ideally, these events should only occur where MIDI clocks would
be located Q this convention is intended to guarantee, or at least
increase the likelihood, of compatibility with other synchronization
devices so that a time signature/tempo map stored in this format may
easily be transferred to another device.
FF 54 05 hr mn se fr ff SMPTE Offset
This event, if present, designates the SMPTE time at which the track
chunk is supposed to start. It should be present at the beginning of
the track, that is, before any nonzero delta-times, and before any
transmittable MIDI events. The hour must be encoded with the SMPTE
format, just as it is in MIDI Time Code. In a format 1 file, the SMPTE
Offset must be stored with the tempo map, and has no meaning in any of
the other tracks. The ff field contains fractional frames, in 100ths of
a frame, even in SMPTE-based tracks which specify a different frame
subdivision for delta-times.
FF 58 04 nn dd cc bb Time Signature
The time signature is expressed as four numbers. nn and dd represent
the numerator and denominator of the time signature as it would be
notated. The denominator is a negative power of two: 2 represents a
quarter-note, 3 represents an eighth-note, etc. The cc parameter
expresses the number of MIDI clocks in a metronome click. The bb
parameter expresses the number of notated 32nd-notes in a MIDI quarter-
note (24 MIDI Clocks). This was added because there are already
multiple programs which allow the user to specify that what MIDI thinks
of as a quarter-note (24 clocks) is to be notated as, or related to in
terms of, something else.
Therefore, the complete event for 6/8 time, where the metronome clicks every three eighth-notes, but there are 24 clocks per quarter-note, 72 to the bar, would be (in hex):
FF 58 04 06 03 24 08That is, 6/8 time (8 is 2 to the 3rd power, so this is 06 03), 32 MIDI clocks per dotted-quarter (24 hex!), and eight notated 32nd-notes per MIDI quarter note.
FF 59 02 sf mi Key Signature
sf = -7: 7 flats sf = -1: 1 flat sf = 0: key of C sf = 1: 1 sharp sf = 7: 7 sharps mi = 0: major key mi = 1: minor keyFF 7F len data Sequencer-Specific Meta-Event
The contents of the MIDI stream represented by this example are broken down here:
To make the protocol efficient, the MIDI transmission of these files will take groups of seven 8-bit bytes and transmit them as eight 7-bit MIDI data bytes. This is certainly in the spirit of the rest of this format (keep it small, because it's not that hard to do). To accommodate a wide range of transmission speeds, files will be transmitted in packets with acknowledge -- this allows data to be stored to disk as it is received. If the sender does not receive a response from a reader in a certain amount of time, it can assume an open-loop situation, and then just continue.
The last edition of MIDI Files contained a specialized protocol for sending just MIDI Files. To meet a deadline, unfortunately I don't have time right now to propose a new generalized protocol. This will be done within the next couple of months. I would welcome any proposals anyone else has, and would direct your attention to the proposal from Ralph Muha of Kurzweil, available in a recent MMA bulletin, and also directly from him.