American National Standards Institute ANSI document X3V1.8M/87-35 Journal of Technical Development Following the meeting of the X3V1.8M Work Group September 21-24, 1987 New York, New York Alan D. Talbot New England Digital Corporation January 15, 1988 X3V1.8M/87-35 "Journal of Technical Development", Meeting of September 21-24, 1987. Alan D. Talbot, New England Digital Corporation. 27 pp. Received 01/15/88. {This document may be duplicated and distributed to others. To contribute a document and/or to obtain copies of other ANSI X3V1.8M Standard Music Representation Work Group docu- ments, contact: X3V1.8M Secretariat, c/o Craig R. Harris, The Computer Music Association, P. O. Box 1634, San Fran- cisco, California 94101-1634 USA.} Contents Introduction 1 Scope and Application 2 Structure and Content 4 Theory of Use 16 SGML Representation 17 Dr. Charles F. Goldfarb Summary 25 Glossary 26 Introduction The following is a technical report on the work con- ducted to date by the ANSI X3V1.8M Work Group. This is a summary of significant technical issues and a state- ment of the current technical position of the commit- tee. This is a working document which is designed to act as a guide, and a starting place for further work. It is not yet a final or definitive declaration of position on any topic. The material has been arranged in a logical progression based on subject, and there is a certain amount of repetition from section to section and paragraph to paragraph. This approach should make it easy to assess individual topics without extensive reading of unre- lated material. In-depth prior investigation of the subject should not be necessary to use this document, although a basic understanding of the goals of the com- mittee and of computer issues in music is assumed. It is hoped that this document will prove useful both to those who attend the meetings and to others who wish to be informed of our work. January 22, 1988 - 2 - All of the ideas expressed here are condensations of work by many people, both those who have attended the meetings and others. I have attempted in places to add material that helps to illuminate the subject, and to synthesize various points of view into a coherent presentation, but it should not be thought that this is to any large extent my work alone. ScopeBefore considering the scope of this Standard, I believe it would be helpful to examine carefully what we mean by a standard. A standard is a language in which certain material will be expressed. It must have the property of being able to express anything that falls within the scope the designers specify. In other words, the scope is the range of material that can be expressed. A language does not make any demands on the material other than that it be within the scope, nor is there any dynamic aspect to a language. A given example of a piece is static, so the language designer does not have to consider the problem of changing the material. The only criteria is whether both the version before the change and after can be expressed correctly. It is especially easy for those of us who are software designers to fall into the trap of thinking like pro- grammers rather than language designers. English can be used to demonstrate a language and its scope. English is a language which is ideally suited to write such material as this document. It also lends itself beautifully to poetry. Mathematics on the other hand can only be poorly expressed in English (calculus and algebra work far better), and music cannot be usably represented at all. Clearly some material is within the scope of English, and some is not. English imposes a certain structure (grammar, vocabulary, spel- ling, etc.), but does not restrain the content of a piece if it falls within the scope. Our language will follow these guidelines. We will strive to create a language that has sufficient scope to be useful, and which can express the bulk of the material it will be called upon to represent in an elegant and straight-forward manner. We will also give due thought to ensuring that most material can be expressed efficiently, but that will not be an initial design criteria. We will not concern ourselves with dynamic efficiency; the modification of material is a consideration for the application software which will produce documents in this language. In any language, scope is a multidimensional thing. The scope of our language will cover many genres of music, but not every one, and for each genre will cover most but not every instance. We have decided that for the January 22, 1988 - 3 - first pass at this problem we will limit the scope to all music which can be reasonably represented in Stan- dard Western Musical Notation. This is not expected to be limiting for the vast major- ity of music. We do not exclude the use of special sym- bols that can be placed in the score, nor of modern notational practices. It is only necessary that the music be representable as notes on a staff, rather than requiring mathematical formulae or a graphic score (such as is sometimes very effectively used to represent computer music works.) This in no way implies that a piece must actually have been written down in Standard Notation, or even have been written down at all. The criteria is only that it could be written in a conventional form. The representational scheme is based on the separation of the basic musical content (pitch, rhythm, harmony, etc.) from the purely performance oriented information (intonation, rhythmic interpretation) and the purely score oriented information (page layout, horizontal spacing, clef). This means simply that some process or machine must be able to separate the work into one or more of these categories for this Standard to represent it. (These divisions are discussed in detail in the next section.) This is not to say that the piece must originate in a separated form, only that it can be separated for the purpose of encoding in the Standard. While it is possible to imagine pieces which are not separable in this way, almost all works in all genres are in fact easily separable. What this scope will mean in practice is that pieces which are composed on computer devices, pieces that exist as printed scores, pieces that are performances recorded such that they can be transcribed by machine, and pieces that are already represented in some language, will all be representable in our language. Pieces that have other sources, such as digital record- ings, can be associated with pieces in our language, but will not actually be represented in it. They will exist as separate entities on the same level as those that are written in this Standard, but will require some other form of representation. This scope covers a lot of ground, and it is hoped that it will suffice for the present. We need some boun- daries to work with in order to proceed in a reasonably timely fashion. It is not, however, intended ultimately to limit the Standard, and future expansion of the scope will always be possible if it proves desirable. Furthermore, the Standard will be designed to allow user expansion of most aspects, thus hopefully January 22, 1988 - 4 - providing the power to handle almost any application. Structure and Content The Standard will be based on a hierarchical structure which describes a piece in terms of four basic sec- tions: the underlying musical form, a set of perfor- mances (presumably to be reproduced by a machine), a set of scores in the form of Standard Western Music Notation, and a set of theoretical analyses. We feel this structure best reflects the conceptual divisions inherent in music in light of the uses to which the Standard will be put. These divisions may not represent the philosophically best approach to the expression of musical ideas, but we feel they will they will be maxi- mally useful. This separation of the whole into perfor- mance and score, and the extraction of the logical musical concepts, seems an unavoidable outcome of the way music has come to be performed and notated, and has long been present in Western music. This hierarchical structure will be codified in terms of elements. Elements are basic structural building blocks, which provide a framework and a means to relate and collect information. Each element has a related information set consisting of attributes. These will contain much of the actual data, as the element itself is basically a place holder. For instance, an event is an element, and may represent a note, in which case it will have attributes describing pitch, duration, and possibly dynamic level. Attributes can be defined by the user as well as the designer. This allows almost unlimited flexibility in representing unusual material that may not have been foreseen during the design. The remainder of this section is devoted to a detailed definition of each element of the structure, and the information it contains. (A description of the applica- tions of these elements is found in the next section.) Some of the attributes have been defined and are described below, but some have not yet been addressed. The assumption is that every element will have an attribute list, containing at least an identification mark for reference by other elements. Additional items will be added to the attribute list as they are defined, but in the interests of top down design, we are concentrating on the overall structure first, leav- ing the myriad and obfuscating details for later. The names used here were picked in some cases rather arbitrarily and do not imply that we consider this par- ticular terminology to be the best. Subdividing a score into "parts" seems fairly logical, but the term "thread" refers to a concept which does not have a counterpart in conventional music terminology. Be January 22, 1988 - 5 - assured that these names are open to revision at any time, but for now we need handles for various concepts, and these are as good as any. Designing a standard such as this requires a sophisti- cated design tool. We have chosen SGML as the best tool for the job. Several other approaches were examined at length during some of the early meetings, but it was decided that unless a dramatically better system is proposed, SGML should be used. The major reasoning behind this is that SGML already exists and provides a powerful tool for designing structured languages. It will allow the easy combination of music and text docu- ments, and will make acceptance of the Standard much smoother since many SGML processing environments already exist. It is intended that the text describing each element and attribute will be a reasonably complete definition and explanation, but the formal language of the SGML coding provides the rigorous definitions underlying the text descriptions, and will show the mechanism behind each technique that is presented. For this reason, excerpts of the SGML encoding have been interspersed with the text at strategic points. It is recommended that the reader refer to the SGML in the text and in the SGML Representation section while reading the remainder of this section. For those unfamiliar with SGML, the following brief explanation will assist in understanding the code that has been interspersed with the text. For a more in- depth explanation, the ISO standard (ISO 8879-1986) is the definitive tutorial and reference on the subject. SGML consists of three basic structural components. It is the usual intent that these structures will contain data, but in our application there is only structure for the moment. (See SGML Representation.) Elements are structural building blocks which can be defined to con- tain data or other elements. An attribute list is asso- ciated with an element and contains values which describe the element. Entities are a structural tool which allow portions of code to be referenced by a label from one or more places in the code. There are several punctuation marks that are important. Declarations (definitions) are surrounded by and comments to the reader are surrounded by -- ... --. For the purposes of this document, the marks - - and - O can be ignored. In each declaration, the following marks may occur: , this followed by the next, & this and the next, | this or the next, ? optional, + required, * zero or more. January 22, 1988 - 6 - Work The top level of the hierarchy is the Work. The work encompasses the entire document, and is defined as the logical musical information, and all of the perfor- mances, scores, and analyses that stem from that musi- cal information. If a "piece" actually has several ver- sions which differ in basic ways, those versions must each be a separate work. All of the remaining elements are contained within the work. The Source is an attribute of a work which indicates what form the piece originated from. It distinguishes between a piece which was captured from a MIDI stream, a piece which was entered from a printed score, and a piece which was composed and entered as logical infor- mation. Work Segment The Work Segment is a structural device for dividing the work along major boundaries. Movements of a sym- phony would be placed in separate segments, as would acts in an opera or any other divisions that affect all aspects of the piece (i.e. all parts, all instruments, etc.) The segment will also be used for making global changes such as key changes, time signature changes and instrumentation changes. If the piece changes key or time signature, that often affects every part and instrument, and indicates a major turning point in the music. In such cases, the material before the change should be in one segment, and the material after in another. One very important use of the work segment will be in cases where the instrumentation changes. If the piece starts out with full orchestra, and later proceeds with only strings, then two segments should be used to separate the sections. This will greatly assist in maintaining a useful relationship between the threads in the core, the parts in the score, and the tracks in the performance. Another use is to indicate the composer's intent. If the composer or the editor want a major division in the work, the work segment can be used to indicate the division even though none of the above situations apply. January 22, 1988 - 7 - The segment will have attributes which identify what kind of division is being described, and which specify the amount of separation between the segments in terms of time. An example would be a "movement" followed by a 15 minute intermission. Bibliographic The Bibliographic entry is found at the top level (as an element of work) and can also be used at lower lev- els. It contains much of the bibliographic and disco- graphic data necessary for the cataloging of a piece. We have not attempted to form an exhaustive structure for the representation of complete library cataloging information. Such a structure would extend the scope of the Standard beyond where we feel it should go at present. Since we are utilizing the machinery of SGML to implement this Standard, another committee could easily create such a complete bibliographic element, and it could be readily included in music documents. We in fact strongly urge the library community to initiate such a project. The bibliographic entry will contain the information necessary to make the Standard useful. Such items as title, author, issuer (publisher), date, and copyright will all be explicitly defined. In addition, a miscel- laneous area will be available which can contain any information that is not defined elsewhere. If desired, a bibliographic entry may be made for each performance in the gestural section, or for each edition in the visual section. Theme The theme will contain references to the core which pinpoint key passages (or famous passages) for the pur- pose of identification of the work. It will allow a cataloging application, for instance, to quickly locate and then display or perform a well known section. This will make it easy for the user to verify that the correct piece has been retrieved. January 22, 1988 - 8 - Core The Core is the basis for a work, and a work has one and only one core. The core contains such information as pitch, note value, harmonic groupings, phrasings, tuplets, etc. A piece for which a core is not produci- ble can not be represented, and a piece with more than one core must be represented as more than one work. We will see, however, that several interpretations of the same basic piece can reside in the same work if they derive from the same core. Let us take the example of a simple piano piece. We have a performance captured by a MIDI sequencer, and the score from which the performance was played. The core will contain an element for each note and rest in the score, thus representing the logical basis of the work. A given measure in the core may contain no notes, and the corresponding spot in the score may say "ad lib". At that point in the performance, there are several improvised notes. It is possible that another performance with a different improvised section, and another score which specifically details a cadenza, might be included in this work and be based on the same core. It may often be desirable that the core have a canoni- cal (normalized) form. That is, that there be one par- ticular form which will always be used for a given piece. (Note that the definition of the core does not provide orthoganality, so there are many ways that a given piece could be represented.) For such situations, an algorithm can be applied which translates any arbi- trary core into a given canonical form. The user may create such an algorithm to fit the needs of the appli- cation, or the Standard Canonical Form can be generated using the Standard Algorithm. We plan to provide this Standard Algorithm both as a way of providing con- sistency between applications and as a model for other algorithms. The core has an attribute which states whether it has been normalized and if so by which algo- rithm. January 22, 1988 - 9 - Stress The stress element indicates how a passage is to be stressed dynamically. It consists of a set of groupings that indicate which beats are to receive what stress. Tempo The core will also have a tempo element which describes the tempo setting and tempo modification to be applied. It will also have a pause value to indicate full stops of various lengths. (Note that the tempo is quite a separate issue from the time of the sequence.) The tempo element consists of a sequence of settings which describe the changes in tempo over the course of the piece. The form of the values of these elements will depend on the form in which the information is supplied. If imprecise data is available, such as "presto" or "ritardando", the value will reflect that imprecision. If exact data is available, such as "126 beats per minute" or "slow down 10 beats per minute over 32 beatsS, then the exact values will be preserved. The tempo setting will specify the duration of the beat in real time. Settings can be in real time units per beat, or in music terminology. The tempo modification will specify the change in the tempo over time. Modifi- cations can be expressed as precise formulae or in music terminology. January 22, 1988 - 10 - Thread The Thread is a sequence of musical events which lasts for the duration of the piece. It is analogous to a track in a sequencer or on a multi-track tape deck. The purpose of the thread is to allow the core to be sec- tioned into concurrent streams of notes and other events, mostly for the sake of convenience. There is no assumption made about how the piece will be divided into threads, but logic suggests that parts in a score, tracks in a sequence, or voices would be the best choices of thread allocation. Core Event Sequence Each thread is made up of core event sequences. A core event sequence is a collection of core events, other core event sequences, and core event groups. An event sequence groups sequential events, as in move-ments, measures or tuplets. These groups may be nested to any depth and combined in any way. Core Event Group The core event group is a collection of events or sequences which are initiated simultaneously. A chord is a group which contains events (notes). A section of a thread may well be a group containing a sequence for each of several parallel voices. This is an alternative to placing each voice in a separate thread. January 22, 1988 - 11 - Core Event The core event is the basic unit of the structure. Notes and rests are examples of core events, but other occurrences may also be represented as events. In gen- eral an event is some occurrence or item which has a single definable starting point in time, and a defin- able duration. Core Event References Core events are accessed through core event references. These are pointers which allow the core to be referred to in arbitrarily complex ways by the performance, score, and analysis sections of the piece. This process will be explored in more depth in Theory of Use. This structure yields a very flexible system for organizing and referring to events. Time It is in the core that the time of the piece is represented. By time we mean the rhythmic relationship of each event to all other events. This is not to be confused with tempo, which refers to the rate of pro- gress of the piece. The time model has several January 22, 1988 - 12 - components which combine to form a system which we hope will account for any situation within the scope of the Standard. Beat All time must be measured in relation to some base which is not open to interpretation. That base will be called the beat. The beat is defined to be that time interval which, at any given point in the piece, is small enough to divide without remainder into all existing subdivisions of the sequence, excluding time anomalies. This beat will only be assigned an absolute value in the gestural section; in the core it is simply a common reference. If the beat changes in meaning as the piece progresses, then the core will be sectioned into more than one sequence. Each sequence will specify the relation of its beat to an overall reference beat. Since the beat is a relative measurement, the perfor- mance can be linked to any time base that is appropri- ate. The beat can be assigned a fixed duration, an algorithmically generated variable duration, or be related to a live recorded click track. Similarly the score can use any appropriate time signature for a given passage. The same piece could, for example, be scored in 4/4 as triplets or in 12/8 as straight eighths. Indeed, a score representation in each meter could refer to the same core. Duration Each core event will have a duration (note value) attribute which is stated as a fraction of a beat. The time consumed by a core event sequence will be the sum of the durations of its events in beats. Accumulated time is therefore represented as the sum of durational time, necessitating the definition of events which sound (notes), and events which do not (rests). The model will support single events or tied events. Tied events are strings of events which are taken together to represent one event with a duration that is the sum of each of the individual durations. When a note starts sounding in one event sequence and contin- ues into the next, the note is split into two tied events of the appropriate duration. Time Factor The time factor is a fraction which describes the rela- tionship of the beat inside a given sequence and the beat surrounding (or underneath) the sequence. Time anomalies (such as triplets) will be represented by setting the time factor to the correct fraction. For example, if the beat of a piece falls on the quarter note (so quarter notes have a time value of 1) and an eighth note triplet is encountered, the triplet could January 22, 1988 - 13 - be expressed as a sequence of three notes of value 1 with a time factor of 1/3, or as a sequence of three notes of value 1/2 with a time factor of 2/3. It may turn out to be desirable to specify that every event sequence must contain an integral (non-fractional) number of beats. This would not be limiting since a common denominator can be found for any situation. Meter The concept of the measure will be expressible in two ways, one implicit and one explicit. Using the implicit method, each sequence will be assigned a meter attri- bute, expressed in terms of beats, which will dictate the placement of measure boundaries. This will allow events to extend across measure boundaries without using tied events. The meter will usually be equivalent to the time signature, with each division indicating a bar line. The inclusion of the meter in the core reflects the philosophy that measures are a basic logi- cal concept in music, rather than strictly a score related issue. This is certainly not true of all music, but the facility must be there for those pieces where it is important. Using the explicit method, each measure of events will be contained in a sequence. This will necessitate the breaking of events which cross measure boundaries into two tied events, but will allow various structural relationships to be dictated on a measure by measure basis. The meter attribute will still be present, and need not agree with the amount of time contained in the sequence. (Such agreement could be enforced by the application program.) A likely application of a combination of these tech- niques is the case of a measure of five which is felt as two and three. Such a situation could be represented as a sequence with a meter of five (primarily a visual consideration) containing sequences of two and three respectively (a more structural consideration). Thus the essence of the piece is captured at the appropriate levels of the hierarchy. In a case where the division of five into sub-meters is not specified, that part of the structure can simply be omitted, thereby maintain- ing the desired ambiguity. It seems likely that the meter will have to be an ele- ment of any core event sequence or thread, rather than of the entire core, since it is fairly common to find music which has several parts which are metrically at odds. Gestural The gestural section of the piece contains the perfor- mances. While each work has only one core, it may have January 22, 1988 - 14 - several gestural sections, each a different performance (and hence different interpretation) of the piece, and each linked to a particular score The gestural section refers to the core for the majority of its musical material, but may have events of its own. Usually these events will be ad lib notes and performance control information such as volume or timbre selection. The gestural section is intended to represent data for an automated performance of the piece. That data could be generated by a live performance or by non-real-time composition, then returned to a synthesizer for reali- zation. Track The track is analogous to the thread in the core. It will be used to drive one channel of sound output, or one instrument. It is the precise counterpart of a track on a multi-track. Unlike the thread, the division of music into tracks may need to follow certain res- traints imposed by the device that will perform the piece. For example a track may have to be limited to events which are to sound in the same timbre. A track is made up of gestural event sequences, which are made up of gestural events, gestural event refer- ences, and core event references. It is through these core event references that the core becomes the basis of the gestural section. While it would be possible through the use of gestural events to represent a per- formance that was unrelated to the core, the intention is that the track will contain mostly performance con- trol information, and refer to the core for most or all of the notes, rests, and other basic conceptual material. Click Track The click track is a gestural event sequence with an event to mark each beat in the piece. It will be the standard technique for relating beats in the core to real time. Click tracks can have arbitrarily spaced events, so any kind of expressive performance can be represented. The click track will usually be generated by a transcription program in the process of creating a work from a live performance. Note that a click track does not need to be present, since a rhythmically exact performance can be generated from the core alone. Visual The visual section of the piece contains the scores. While each work has only one core, it may have several visual sections, each a different edition (and hence a different interpretation of the piece), and each linked to a particular performance. The visual section refers to the core for the majority of its musical material, but may have events of its own. Usually these events January 22, 1988 - 15 - will be symbols that appear on the score aside from notes, rests, and accidentals. Such items as phrase markings, beams, accents, dynamic markings, and lyrics would be found here. The visual section is intended to represent the printed score in Standard Western Music Notation. The score could be generated by a music printing system and returned to such a system for printing or display. Part The part is analogous to the thread in the core. It will be used to print one part of the score for one instrument. It is the precise counterpart of a staff in a score. The division of music into parts will be based on the desired appearance of the score. A part is made up of visual event sequences, which are made up of visual events, visual event references, and core event references. It is through these core event references that the core becomes the basis of the visual section. While it would be possible through the use of visual events to represent a score that was unrelated to the core, the intention is that the part will contain mostly visual symbols, and refer to the core for most or all of the notes, rests, and other basic conceptual material. Space The unit of space will be defined relative to the size of the staff and note heads. The actual size of the printed staff is not defined except perhaps as a global attribute of the visual section. A unit of one staff space for the vertical and one note head width for the horizontal will provide the basis for all spatial meas- urements. Spatial relationship will be representable in several ways: as an absolute position on a line (staff), as a relative position from another object, and as a rela- tive position from a logical (time) position on a staff. Furthermore, for each of these possibilities there will be an explicit position (specified in spa- tial units) and an implicit position. The implicit position will take the form of a non-numerical rela- tionship to some other object, such as "above the staff" or "between this note head and the one to the left". Analytical The analytical section of the piece contains any ana- lyses that may have been produced. A work may have several analytical sections, each a different analysis (and hence a different interpretation of the piece.) The analytical section refers to the core for the majority of its musical material, but may also refer to perform- ances and scores. The analytical section is January 22, 1988 - 16 - intended to represent a structuring of the piece based on any style of analysis. The analysis could be gen- erated by a specialized music printing/editing system and returned to such a system for printing or display, or might take the ultimate form of a written document. It might even be generated automatically by a computer system. Voice The voice is analogous to the thread in the core. It will be used to represent one voice or melodic line of the piece. It is the counterpart of a passage of notes that have the same stem direction. The division of music into voices will be based on the voicing of the piece intended by the composer or analyst. A voice is made up of analytical event sequences, which are made up of analytical events, analytical event references, and core event references. It also can con- tain gestural event references and visual event refer- ences. It is through these references that the analyti- cal section can arbitrarily structure any aspect of the piece in order to illustrate a music theoretical idea. Theory of Use Having described the details of the structure of the Stan- dard, I would like to elaborate on its intended use. As a language, it can be put to a wide variety of uses ranging from the highly appropriate to the completely pathological. It was, however, designed with a particular set of applica- tions in mind, and will be most effective if used for these. Knowing the design assumptions will also facilitate applica- tion of the Standard to unforeseen or unusual situations. It is hoped that this section will answer many of the questions that will arise concerning applic- ability. In general, the Standard is intended as a storage and inter- change format for musical ideas. It is designed to be some- what human readable so that a piece could theoretically be created by using a word processor and entering the encoded material directly. However, it is expected that it will be used mainly for automated processing in such areas as music printing, library cataloging and storage, multimedia presen- tations, teaching, and research. For other situations, such as live performance or sound recording, other formats are likely to be more applicable. A piece to be represented can originate from almost any source. An automated composition program might generate a core and an associated gestural section, an interactive music printing system might generate a core and a visual section. A sequencer might capture a live performance and transcribe it into a core and performance section, and then January 22, 1988 - 17 - turn the piece over to a music printing system for the crea- tion of the visual and analytical sections. There is much flexibility in the way the Standard can be used and the situations to which it can be applied. The only common ele- ment is the core, the others need not even be present. The gestural section is designed to be used for the representation of computer instrument sequences. This does not mean that it is a sequencer format for internal use by sequencers. In fact it would be poorly suited for that application. It is for archiving and transporting music that has been, or will be, processed in some way by a computer system. A performance may be captured on a synthesizer, it may be interpreted from a MIDI stream, or it may be translated from another language, such as a MIDI file sequence format, or MUSIC V. A sequencer might read a piece in the Standard, translate it into an internal data format, and then realize it in real time. The visual section will be used for representing scores of all kinds. The score may have an accompanying performance or it may not. The score may be entered or captured using a music printing system, or it may be translated from DARMS or MUSTRAN. It might be retrieved as a display on a screen, a printed page, or translated into another language. Most importantly it will allow systems of all kinds to inter- change scores easily and accurately. The analytical section will be used to represent theoretical ideas in a structural format. Any sort of layering and grouping will be possible, so various styles of analysis will be supported. A given piece may have several analyses (i.e. one Shenkerian, one classical), which could even refer to each other. An analysis of a piece with a circular score could refer to the score and the performance in an attempt to relate the music to the shape of the score to the verti- ginous effect on the performer. The SGML Representation The following is an SGML coding which defines rigorously the concepts mentioned above. This is not yet really a draft standard, but does illustrate how the structure can be put into practice quite directly in SGML. It does provide a con- crete reference point for continuing work and discussion. There will of course be much revision of this code before any part is considered firm and decided. It should be noted in reading this code that markup minimi- zation has been not been used to any great extent. In any practical application most of the verbiage will be omitted, resulting in an efficiently coded document. There needs to be some careful work done in this area to ensure that the January 22, 1988 - 18 - markup minimization is powerful enough to bring the effi- ciency up to an acceptable level. It is also evident that the entire content of the Standard has been encoded into a structure, and that there appears to be no actual data to reside within the structure. This has been done to facilitate the design process and to ensure that the results are clear. It allows relationships to be seen easily and large scale trends and requirements evaluated. Once the design is complete, the lower levels of the structure will be dissolved, leaving a much more bal- anced system of structure and data and raising the effi- ciency to an acceptable level. The ability to do this reduc- tion is built into SGML, so the process should not present a problem. (The remainder of this section was submitted by Dr. Gold- farb.) Document Type Definition The following DTD for a musical work is incomplete in a number of respects: 1. Most attributes have not yet been defined. As a result, many of the ATTLIST declarations appear identical to one another. In such cases, we expect that the lists will be differentiated by attri- butes that will be defined later. 2. The lowest-level elements (events) ge, ve, and ae, are temporary placeholders for lists of distinct element types (for example, bar lines, clefs, etc.). Eventually, the entity references to lists of the distinct types will be completed to replace these element names. For now, only the entity reference for core events (%e.ce;) has no place- holder, but even that list is incomplete. Moreover, as we are first attempting to define those attributes which all events have in common, a single ATTLIST is used in each domain. Eventu- ally, each event may have its own ATTLIST declara- tion, as was done for core events (note, rest). 3. Little detail is provided on the actual encoding of an instance of a work. As we are first attempt- ing to identify the potential events and to define their properties (attributes), the DTD acts as though all events will be encoded with start-tags and end-tags, with all properties specified using the SGML attribute notation. This convention is satisfactory (even advantageous) while we are designing the structure and semantics of the Stan- dard. Eventually, we expect to define a concise January 22, 1988 - 19 - coding scheme for the lowest level(s) of the structure, perhaps resorting to tags only when there is a change in one of the attributes. (In SGML, such a scheme is known as a "data content notation".) 4. There are many elements that have common content models and, at least for the moment, common attri- bute lists. As a matter of development methodol- ogy, we felt it better to assume that elements that represent different semantic constructs (e.g., tracks and parts) are likely to have dif- ferent attributes when the design is complete, even though they may have identical structures. If the presumption proves incorrect in any instance, we will of course remove the redundancy when finalizing the design, but premature optimization might cause us to overlook vital differences. 5. For attributes that have been defined, particu- larly those whose domain is a list of specific values (such as fermata), we have typically pro- vided only a nominal list of values. We expect that once the overall structure is firm, experts will be able to contribute more complete lists. Such attribute domains can also be made user- extensible if that is desirable. January 22, 1988 - 20 - "> %d.bib; January 22, 1988 - 21 - January 22, 1988 - 23 - January 22, 1988 - 24 - January 22, 1988 - 25 - Summary After five meetings, we feel we have a solid structure to work with. While this structure may change in many details before we are finished, it is comprehensive enough build from and to support elaboration of any aspect. We believe the basic structure is sound, and will prove powerful enough to handle the job at hand. We must now exercise this struc- ture and determine its strength and usefulness. This will be done by elaborating several key areas, and watching for problems. We have started this process with the time problem. This was a crucial issue and had to be finished before less difficult issues (such as pitch) are approached. We have made consid- erable progress, and we now feel that we have a model which will suffice. We will continue to develop key areas until we have enough framework to go back and fill in details. As we January 22, 1988 - 26 - have been maintaining the SGML code, we have hopes that it will not be long before a draft standard starts to emerge. Glossary The following terms are used in a number of places in the text but are not explicitly defined. They are essential to the understanding of the Standard, and have been assigned meanings which differ from common usage. logical: As used here, we mean the basic musical content of a piece of music, such as the time values, pitch values, and basic groupings such as chords and tuplets. markup minimization: The elimination of redundant verbiage in the actual representation of a work. SGML has been designed to allow this to happen naturally, so it is not necessary to consider it in the initial design of the Standard. real time unit: A time unit specified for a given work, usually seconds to several decimal places. The point of allowing the unit to change is to allow pieces with large time scales and small time scales to be accommodated without using unduly large numbers. SGML: Standard Generalized Markup Language is a text markup language and structured design tool. SGML is an Inter- national Standard and is fully defined and described in ISO 8879-1986. tuplet: A group of notes who's time is out of the context of the surrounding notes. A time anomaly. A triplet, a quintuplet, and a duplet in compound meter are all tuplets. January 22, 1988