A Comparison of the Network Time Protocol and Digital Time Service

David L. Mills
University of Delaware
12 February 1990

Review by Joe Comuzzi, DEC
Further Commentary by Dave Mills, UDel
18 March 1990

Following is a review and commentary on the above document, which is
available in the file pub/ntp/dts.txt on louie.udel.edu. This document
is available in the file pub/ntp/dtsrev.txt on the same host. The
original document is based on the DTS specification version T1.0.5 dated
18 December 1989, which I assume can be obtained from DEC.

At my suggestion Joe Comuzzi of DEC thoroughly and incisively reviewed
my document comparing NTP and DTS. He found some agreement, some
disagreement and some errors on my part. I much appreciate the
time and care this effort required. In the same spirit, I have reviewed
his comments and responded with comments of my own. As time permits I
intend to incorporate appropriate revisions into the body of the original
document and submit for wider distribution. Meanwhile, I offer the
following discourse for further comment and evaluation. Personally, I
have found the exchange useful, stimulating and suggestive of further
refinements to NTP.

The following discourse includes only those portions of the original
document that are relevant to the reviewer's comments. These are indented
three spaces. The reviewer's comments are flush with the left margin.
These comments are included in their entirety and are unedited. My reply
comments are preceded by a ">" symbol. References to the latest
specification are to RFC-1119, with exception of the mention of new
appendices in the revised version of February 1990, which can be found in
the PostScript file pub/ntp/ntp.ps on louie.udel.edu.

-------------------------------------------------------------------------
 
   The Digital Time Service (DTS) for the Digital Network Architecture
   (DECnet) is intended to synchronize time in computer networks ranging in
   size from local to wide-area.
 
You seem to be trying to clothe DTS in a propritary cloth. We now refer to
DECnet as DECnet/OSI since we've incorporated OSI protocols into the 
protocol stack. It is our intention to pursue DTS in the OSI standards
forums.

> I have no intent to clothe DTS in anything other than explicitly stated
> on the cover and introduction to the spec document. There is says "DNA
> Phase 5 network." I will be glad to preach any other gospel or creed
> practiced by DEC's men of cloth if you will change the cover and
> introduction to the spec.

                                  As such it is intended to provide service
   comparable to the Network Time Protocol (NTP) for the Internet
   architecture.

While both are clearly addressing the same problem space, DTS and NTP
have VERY different goals. I recently spoke to the president of a time
provider manufacturer and I liked his jargon, he distinguished between the
time-of-day market and the frequency market. The time-of-day market wants
to know what time it is, it is not interested in small errors and it
doesn't want to pay a lot. The frequency market wants stable frequency
sources, needs high stability and is willing to pay.

> I didn't know the time providers distinguished between the time-of-day
> market and frequency market. Certainly their customers don't know the
> difference. No timecode receivers known to me have the requisite
> stability to be considered primary frequency providers in any case;
> that's what rubidium and cesium standards are for. I do not understand
> the basis for your conclusion that accurate frequency costs more than
> accurate time. While the algorithms are somewhat more complicated and the
> host-clock implementation must be more rigidly specified, this does not
> necessarily cost more, especially if there is almost a decade of research
> in refining the methodology.

NTP is a solution for the frequency market. DTS is only interested in
the time-of-day market. The major cost for these solutions is not
the initial capital investment, but the long term management and operation
cost. As such DTS has goals of auto-configurability and ease of management
which are not present in NTP.

> If you are convinced that accurate, reliable time-of-day service can be
> achieved without consideration for frequency and believe that errors as
> much as several seconds per day in the absence of connectivity are
> acceptable, then I won't argue with DTS being a reasonable approach. I
> accept that NTP has goals primarily of stability, accuracy and
> reliability and secondarily of configurability and ease of management,
> since other Internet protocols would be expected to provide those
> functions (see below).

> (portion deleted)

                            The goal of a distributed timekeeping
   service such as NTP and DTS is to synchronize the clocks in all
   participating servers and clients so that all are correct, indicate the
   same time relative to UTC, and maintain specified measures of stability,
   accuracy and reliability.

As stated above, DTS is addressing the time-of-day market hence high
frequency stability is an not a goal of DTS.

> Do you mean that "specified measures of stability, accuracy and
> reliability" do not apply to DTS? Should I specifically point out that
> stability is a non-goal of DTS? A stability bound is in fact an
> architectural constant "maxDrift" in DTS, which sounds like a
> "specified measure" to me.
    
> (portion deleted)

   Servers, both primary and secondary, typically run NTP with several
   other servers at the same or lower stratum levels; however, a selection
   algorithm attempts to select the most accurate and reliable server or
   set of servers from which to actually synchronize the local clock. The
   selection algorithm, described in more detail later in this document,
   uses a maximum-likelihood clustering algorithm to determine the best
   from among a number of possible servers. The synchronization subnet
   itself is automatically constructed from among the available paths using
   the distributed Bellman-Ford routing algorithm [BER87], in which the
   distance metric is modified hop count.

Note that in DTS loops are not a problem, if a system sends out a time
an ultimately gets back a derived time, due to the communication delays
the derived time will always arrive back with a larger inaccuracy.
The only exception to this is the possibility of a system with a time
provider and a lousy clock. Then the derived time's inaccuracy could be
smaller if the time was parked in a system with a good clock. But in
this case the network clearly has information that the original system
has lost.

> It would seem that the strategy to avoid subnet loops is similar in both
> NTP and DTS, although in NTP the metric is stratum (hop count) and in
> DTS it is the inaccuracy interval (is there a better word than
> "inaccuracy" with a more positive connotation?) Both NTP and DTS
> appear to operate in similar ways to cast out noisy timecode receivers,
> although it is not clear to me how the DTS manager determines from the
> protocol and the radio what the inaccuracy interval should be. Both NTP
> and DTS model the receiver similar to an ordinary peer, presumably with
> smallest inaccuracy interval or lowest stratum. In principle, both could
> estimate these and related information directly from the timecode
> samples.
    
> (portion deleted)
    
   The NTP specification includes no architected procedures for servers to
   obtain addresses of other servers other than by configuration files and
   public bulletin boards.

This is a serious short-coming of NTP and definitely makes it harder to
manage. It is unclear to me why you haven't fixed this since it would
not seem that difficult to store server names in a namespace.

> There are three issues here: (1) how to discover a set of time-servers
> which are potentially useful peers, (2) how to intelligently select
> an appropriate subset, based on performance expectations and (3) how
> to translate names to addresses. Internet protocols are notoriously
> weak on (1) and (2); however, (3) is a non-issue with NTP, since all
> NTP daemons use the Internet DNS to resolve addresses from names and in
> principle could use the DNS to discover servers (WKS records). For (1)
> now, there is a master file on an obscure host, which is updated
> haphazardly at irregular intervals using completely unauthenticated
> data obtained from unreliable sources. Issue (2) is Real Hard when
> the number of potential peers runs in the thousands and considerations
> of network overhead, access policy and export control (drat DES) are
> involved.

> DTS uses LAN discovery protocols and automatic global server registration
> in a global database, which vastly simplifies (1) and (3); however, I
> submit that, as DTS gets bigger, (2) will become as hard in DTS as it
> has in NTP. For instance, survey evidence suggests there are over 2000
> hosts supporting NTP and potentially available as servers registered
> in the DNS. Using the DTS model that flushes the server list every 12
> hours and expects that every server and clerk maintains the entire
> list, one might expect a good deal of network clanking, unless the list
> were pruned and stratified as a cooperative management exercise.

                           While servers passively respond to requests from
   other servers, they must be configured in order to actively probe other
   servers. Servers configured as active poll other servers continuously,
   while servers configured as passive poll only when polled by another
   server. There are no provisions in the present protocol to dynamically
   activate some servers should other servers fail.

This is harder to fix and interacts with the spanning tree. Here at least
I can see why you didn't make it easier to manage.

These problems make NTP a system administrators nightmare, but are
consistent with the two different sets of goals. Consistent with DTS goals
we've accepted some "clock hopping" in exchange for ease of management.

> I'm not sure what you mean by "nightmare." Most NTP administrators
> snarf a copy of one of the two Unix daemons, compile it locally,
> make an uneducated guess which existing server(s) in the master list to
> use based on advice included in the distribution, build a simple
> configuration file, turn the keys and walk away. In fact, DEC is
> presently distributing NTP with Ultrix and includes a five-page writeup
> on how to do this; which, although not an engineered solution, would
> not ordinarily be considered a nightmare.

> While you and I might consider NTP configuration crude, it is really no
> better or worse than bringing up a j-random router or DNS server. In DTS
> clients and servers wake up once in a while and solicit time in
> connectionless mode on LANs and connection mode on WANs, while in NTP
> peers solicit time continuously at controlled rates in connectionless
> mode. On the issue of "dynamically activate," it appears that DTS does
> just that with backup couriers in order to minimize WAN overhead.
> This is a good thing and should be done in NTP. Dynamic activation is
> on my list, but not above integration with the IP multicast service. 
    
   In response to stated needs for security features, NTP includes an
   optional cryptographic authentication mechanism. NTP also includes an
   optional comprehensive remote monitoring mechanism found necessary for
   the detection and repair of various problems in server and network
   configuration and operation. It is anticipated that, when generic
   features capable of these functions have been developed and deployed in
   the Internet, the NTP authentication and monitoring mechanisms may be
   withdrawn.

> This might be called poor-boy network management; expedient and ugly,
> but necessary. An SNMP interface is in progress for one of the Unix
> daemons. Same goes for the authentication mechanism, which is a
> necessary feature used to partition the subnet for repair when
> a server comes unglued.
    
< (portion deleted)
    
   In DTS a synchronization subnet consists of a structured graph with
   nodes consisting of clerks, servers, couriers and time providers. With
   respect to the NTP nomenclature, a time provider is a primary server, a
   courier is a secondary server intended to import time from one or more
   distant primary servers for local redistribution and a server is
   intended to provide time for possibly many end nodes or clerks. Time
   providers, servers and couriers are evidently generic, in that all
   perform similar functions and have similar or identical internal
   structure.

Not only are they generic, they are dynamic. If a time provider system
loses its radio signal, it immediately reverts to a server, providing
graceful degradation in the presence of failures.

> Your enthusiasm is contagious. NTP does exactly the same thing.

The DTS story is actually even better here, we provide a well defined
time provider interface. This can be used to implement a time provider
without requiring modification of the protocol portions of the time
service. (On Unix systems it uses Unix domain sockets). This greatly
eases adding a new time provider, and permits time provider vendors to
supply it with their hardware. Note, NTP could (and probably should) do
this also. We have already done it.

> The NTP spec includes a procedure for time provider interface, although
> the entity interactions are only informally specified. However, the NTP
> interface is substantially the same as the peer interface, while in DTS
> the interface is different. Perhaps the most interesting difference is
> that the DTS provider interface expects a series of time values and
> uses the DTS procedures to refine the estimate, which is similar in
> intent to the NTP clock filter, but the NTP clock filter applies to
> all peers in addition to the provider.

> As specified in the introduction, the NTP spec is not intended as a
> formal one (in the best and worst Internet traditions). However, we
> have a little project at UDel to rewrite it in Estelle and throw test
> cases at it. The project has already found a small number of minor
> sleazes and obscurisms. You are to be congratulated in your formal
> approach using Modula2+. Have you subjected the protocol description
> to formal verification and testing? Can you make your Unix daemon
> available for testing? Would you agree to publish the spec document
> as an RFC?
    
   As in NTP, DTS clients and servers periodically request the time from
   other servers, although the subnet has only a limited ability to
   reconfigure in the event of failure.

I don't understand this statement. Reconfiguration within a LAN is
about as complete as one could imagine. The random selection of global
servers is robust against any non-partitioning WAN failures.

> My statement was misleading and should be clarified. Assuming the
> global directory service is robust, DTS certainly is robust against
> non-partitioning WAN failures; however, there are only three levels
> in the DTS subnet (global server, courier/server, client). In NTP there
> can be several levels or strata (commonly up to five or more). My comment
> was meant in the context of reforming the NTP subnet as a spanning
> tree routed at the primary servers when something croaks. This of course
> requires engineered peer paths and prior knowledge of WAN connectivity,
> which is certainly not among the goals of DTS.

> (portion deleted)
    
   On local nets DTS servers multicast to each other in order to construct
   lists of servers available on the local wire. Clerks multicast requests
   for these lists, which are returned in monocast mode similar to ARP in
   the Internet. Couriers consult the network directory system to find
   global time providers. For local-net operation more than one server can
   be configured to operate as a courier, but only one will actually
   operate as a courier at a time.

This is false, I think you're failing to distinguish between couriers and
backup couriers. There can be more than one courier per LAN, each will
always synchronize with at least one member of the global set. Backup
couriers use an election algorithm in the absence of a courier. Only one
backup courier will be elected to function as a courier.

> Correction noted. Do you always expect to have multiple couriers (other
> than the single elected backup) in order to insure diversity and
> redundancy anyway? The local servers check each other for consistency
> and those set as couriers read at least one, but not necessarily more
> than one, global clock.

                                   There does not appear to be a multicast
   function in which a personal workstation could obtain time simply by
   listening on the local wire without first obtaining a list of local
   servers.

That is correct, it would violate the principle that a message exchange
has to happen in order to correctly assign an inaccuracy.

> There appears to be a considerable Internet constituency which has
> noisily articulated the need for a multicast function when the number of
> clients on the wire climbs to the hundreds. Having responded to the
> articulation noise, I thought it might be a reasonable idea to include
> this capability (so far untested) on LANs with casual workstations,
> promiscuous servers and simple protocol stacks.

> (portion deleted)
    
   Perhaps the widest departure between the NTP and DTS philosophies is the
   basic underlying statistical model. NTP is based on maximum-likelihood
   principles and statistical mechanics, where errors are expressed in
   terms of expectations. DTS is based on provable assertions about the
   correctness of a set of mutually suspicious clocks, where errors are
   expressed as a set of computable bounds on maximum time and frequency
   offsets. This section explores these models and how they affect the
   quality of service.

> You chose not to respond to the statistical models presented. Does that
> mean you are in substantial agreement with the exposition?

> (portion deleted)

   Both NTP and DTS exist to provide timestamps to some specified accuracy
   and precision. NTP represents time as a 64-bit quantity in seconds and
   fractions, with 32 bits as integral seconds and 32 bits as the fraction
   of a second. This provides resolution to about 200 picoseconds and
   rollover ambiguity of about 136 years. The origin of the timescale and
   its correspondence with UTC, atomic time and Julian days is documented
   in [MIL90c]. DTS represents time to a precision of 100 nanoseconds,
   although there appears to be no specified maximum value.

The DTS time is a signed 64 bits of 100 nanoseconds since Oct 15, 1582.
It will not run out until after the year 30,000 AD. Unlike NTP which
will run out in 2036. I, for one, intend to still be alive in 2036!

There are two reasons the 100 ns. was chosen:
 1) We want to use these timestamps as a time representation, for
    filesystem timestamps, etc. We REALLY don't want to deal with the
    problem that our representation is inadequate in some reasonably
    future time. Also, since the 64 bits is signed, times back to
    28,000 BC can be represented. This is potentially useful for
    astronomical data, and happily, includes all of recorded history.

    If we decreased the resolution, we would give up range. This choice
    seemed like a reasonable compromise.
 2) Since we include the the transmission delay in the inaccuracy, 100 ns
    represents only 30 meters. Its not meaningful to talk about
    synchronizing clocks below that level with our algorithm. (I believe
    its not meaningful to talk about synchronizing clocks below that
    level with NTP either).
    
The total timestamp is 128 bits, this includes a four bit version number
field which would permit these decision to be revisited in the future.

> I won't argue with your choice of timestamp format. My choice was
> conditioned both by pragmatic issues of compatibility with other Internet
> timekeeping protocols, as well as a perceived need to operate at the
> highest accuracies and precisions capable of national laboratories. As
> for synchronizing clocks with NTP below the 100-ns level, a project to
> do exactly that is in progress here to compare LORAN-C and cesium time.
> Note that not all NTP subnets operate using general-purpose computing
> systems. My own zeal in pursuing the ultimate accuracy and precision
> is largely conditioned by our ongoing work in gigabit network routing
> and network synchronization.

> In any case, the DTS timestamp format including inaccuracy and version
> is a good idea. In principal, the inaccuracy is available in NTP in the
> form of the synchronization distance and dispersion, but this is not
> normally available at the Unix interface.

> (portion deleted)
    
   With respect to applications involving precision time data, such as
   national standards laboratories, resolutions less than the 100
   nanoseconds provided by DTS are required. Present timekeeping systems
   for space science and navigation can maintain time to better than 30
   nanoseconds, while range data over interplanetary distances can be
   determined to less than a nanosecond. While an ordinary application
   running on an ordinary computer could not reasonably be expected to
   expect or render precise timestamps anywhere near the 200-picosecond
   limit of an NTP timestamp, there are many applications where a precision
   timestamp could be rendered by some other means and propagated via a
   computer and network to some other place for processing. One such
   application could well be synchronizing navigation systems like LORAN-C,
   where the timestamps would be obtained directly from station timekeeping
   equipment.

There is an obvious inconsistency in your position here. If you're just
using the NTP time format for synchronization, then talking about 136 year
rollovers makes some sense. It could be hidden from the users by extending
the protocol. If, however, as this paragraph implies you intend the NTP
time format as a general timestamp, then there will be extreme pain in the
year 2036. (This is refered to in DEC as the "date75" problem!) To avoid
this without unduly extending the timestamp DTS has traded off being able
to use its timestamp format for certain highly precise applications.

> I have vivid memories of shout-out meetings in the early eighties
> where we Interbums staked out positions on what you call the "date75"
> problem. It seems that, no matter what resolution and rollover parameters
> you select, somebody will complain the Big Bang or End of Time cannot
> be represented to femtoseconds. For that matter, while my personal clock
> may expire before 2036, even now I have great pain keeping track with
> conventional date notation of investments that mature after the century
> turns. In NTP I chose to explicitly and purposely leave out the 136-year
> disambiguation function and relegate that to a higher protocol that
> includes both this function and leap-second recording in network
> institutional memory. Since the Earth is winding down in an unpredictable
> way and papal bulls cannot endure forever and we haven't even got the
> Julian days and Gregorian centuries consonant yet, I concluded that
> life is too short and, like astronomers, we all should have used
> (modified) Julian day-fraction reckoning in the first place.
    
> (portion deleted)
    
   NTP specifically and intentionally has no provisions anywhere in the
   protocol to specify time zones or zone names. The service is designed to
   deliver UTC seconds and Julian days without respect to geographic
   position, political boundary or local custom. Conversion of NTP
   timestamp data to system format is expected to occur at the presentation
   layer; however, provisions are made to supply leap-second information to
   the presentation layer so that network time in the vicinity of leap
   seconds can be properly coordinated. DTS includes provision for time
   zones and presumably summer/winter adjustments in the form of a
   numerical time offset from UTC and arbitrary character-string label;
   however, it is not obvious how to distribute and activate this
   information in a coordinated manner.

The information is used only as a help in user displays. That is, an
application can display BOTH the UTC time and the local time at which
a timestamp was created. It only cost 12 bits to do this. No use is
made of the timezone information by DTS or by systems.

> That clarifies the issue. Your intent is only to qualify the origin
> of the timestamp. Point noted.
    
   NTP and DTS differ somewhat in the treatment of leap seconds. In DTS the
   normal growth in error bounds in the absence of corrections will
   eventually cause the bounds to include the new timescale and adjust
   gradually as in normal operation. Recognizing that this can take a long
   time, DTS includes special provisions that expand the error bounds at
   such times that leap seconds are expected to occur, which can shorten
   the period for convergence significantly. However, until the correction
   is determined and during the convergence interval the accuracy of the
   local clock with respect to other network clocks may be considerably
   degraded.
    
   The accuracy and stability expectations of NTP preclude this approach.
   In NTP the incidence of leap seconds is assumed available in advance at
   all primary servers and distributed automatically throughout the
   remainder of the synchronization subnet as part of normal protocol
   operations. Thus, every server and client in the subnet is aware at the
   instant the leap second is to take affect, and steps the local clock
   simultaneously with all other servers in the subnet. Thus, the local
   clock accuracy and stability are preserved before, during and after the
   leap insertion.

Each server has to maintain and propagate this state before the leap
insertion. This is, of course, subject to Byzantine failures. A failing
server can insert a bad notification.

> Did I miss something? By "propagate this state" do you mean DTS will
> propagate advance notice of leap seconds? From what I can find rummaging
> over the text, it appears that entities are expected to add one second
> to their inaccuracy intervals at the end of June and December, which
> would certainly shorten the convergence period if a leap did in fact
> occur; However, there will be an unpredictable interval following that
> when the clocks are all scurrying to catch up and network time can
> be inconsistent up to a second. I worry about Byzantine failures, too.
> That's why all those NTP timestamp consistency tests and, ultimately,
> the NTP authentication scheme. It would appear that DTS is vulnerable
> to replay in the same way NTP is vulnerable without this scheme.
   
> (portion deleted)
    
   At first glance it may appear that NTP and DTS have quite different
   models to determine delay, offset and error budgets. Both involve the
   exchange of messages between two servers (or a client and a server).
   Both attempt to measure not only the clock offsets, but the roundtrip
   delay and, in addition, attempt to estimate the error. The diagrams
   below, in which time flows downward, illustrate a typical NTP message
   exchange in each protocol between servers A and B.
    
                 A          B                 A          B
    
                 |          |                 |          |
              t1 |--------->| t2           t1 |--------->|--- t4
                 |          |                 |          | |
                 |          |                 |          |
                 |          |                 |          | w
                 |          |                 |          |
                 |          |                 |          | |
              t4 |<---------| t3           t8 |<---------|---
                 |          |                 |          |
    
                      NTP                         DTS
    
   In NTP the roundtrip delay d and clock offset c of server B relative to
   A is
    
                            d = (t4-t1) - (t3-t2)
                          c = ((t2-t1) + (t3-t4))/2.
    
   This method amounts to a continuously sampled, returnable-time system,
   which is used in some digital telephone networks [LIN80].

The derivation of the expression for 'c' above assumes the two transit
delays for this exchange are symmetric. If there are systematically
asymmetric transmission delays then the NTP algorithm will shift the two
clocks so that they appear to be synchronized, when in fact they are
systematically off by some number of milliseconds. The NTP minimum
filter attempts to minimize this effect assuming that the shortest round
trip exchange would have to be symmetric or nearly so. Unfortunately quite
large systematic asymmetric delays can occur for a variety of reasons:
source-routed networks, broken routing tables, etc. and these would apply
to all transactions including the shortest. This problem exists in DTS
also, but in DTS both of the systems will have an inaccuracy which
encompasses the correct time. That is, DTS will not claim to have
synchronized clocks to a level which it has not, even in the presence of
asymmetric delays. NTP can and has.

> Your observation on asymmetric paths leading to undetectable systematic
> errors with both NTP and DTS is correct and is routinely observed to
> varying degrees on the Internet. In fact, leaving out adjustments
> necessary for frequency offset and precision (in both NTP and DTS) the
> above formulas can be rewritten as presented in the DTS spec. We have
> a project here designed to collect offset data from many or even all
> subnet servers at non-intrusive rates in order to detect and correct
> for asymmetric paths using correlation techniques.

> I'm not sure what you mean by "NTP can and has" claimed to "have
> synchronized clocks to a level which it has not, even in the presence
> of asymmetric delays." NTP does not claim to synchronize to any level,
> only to minimize the level of probabilistic uncertainty and estimate
> the error incurred. In any case, what NTP calls the synchronizing
> delay represents in fact the error bound relative to the synchronizing
> path to the primary source. 

> These are probabilistic data and must be interpreted with respect to the
> probability model which applies to real Internet paths. It may be that,
> with an appropriate queueing model and assumed distribution functions,
> a quantitative error probability function could be derived. Having
> travelled those roads before myself, I conclude my pragmatic approach
> to error estimation is probably as good as any. See [ALL74] for an
> alternative approach.

> (portion deleted)
    
   Both NTP and DTS have to do a little dance in order to account for
   timing errors due to the precisions of the local clocks and the
   frequency offsets (usually minor) over the transaction interval itself.
   A purist might argue that the expression given above for delay and
   offset are not strictly accurate unless the probability density
   functions for the path delays are known, properly convolved and
   expectations computed, but this applies to both NTP and DTS. The point
   should be made, however, that correct functioning of DTS requires
   reliable bounds on measured roundtrip delay, as this enters into the
   error budget used to construct intervals over which a clock can be
   considered correct. 

However, this is not at all hard to compute. Simply increase the inaccuracy
by the potential drift of the local clock during the transaction. The
architecture specifies this.

> Not hard to do in NTP either, as the architecture specifies. The
> difference is that in NTP this is represented as a time-insensitive
> bound, since the architecture expects the local-clock algorithm to
> compensate for frequency errors. The system expectation is that the
> (corrected) local clock does not wander more than an architectural
> constant of 30 ms per day. Even in NTP it might be a good idea
> to ratchet up the imputed skew when all sources are lost and the
> bandwidth of the tracking loop is relatively large. This will be
> considered in future.

> (portion deleted)
    
   NTP maintains for each server both the total estimated roundtrip delay
   to the root of the synchronization subnet (synchronizing distance), as
   well as the sum of the total dispersion to the root of the
   synchronization subnet (synchronizing dispersion).
 
This synchronizing distance has a rather loose definition. I believe the
current NTP RFC suggests using ten times the mean expected error for
the synchronizing distance. If this parameter is important to the NTP
algorithm I would expect some stronger specification. Also, where does
the value ten come from? I know its experimentally derived and seems to
work...

> I must have confused you. Both the distances and dispersions are formally
> defined in the spec. The factor of ten applies only in cases where the
> delay and/or dispersion cannot be measured, such as with some timecode
> receivers. Elsewhere throughout the subnet these quantities are
> calculated. You will observe that the dispersion quantity is rather
> artfully concocted (for efficiency reasons) and not directly convertible
> to the usual second-moment statistics. Well, all I can say is that other
> practitioners of these black arts mumble similar voodoo, but the
> performance as error estimator is still pretty good. Now, I should make
> clear that a goal of NTP is to maintain overall accuracy relative to
> the synchronization distance (roundtrip delay) to the root of the subnet
> on the order of one-tenth that distance. That is an arbitrary goal, but
> believed achievable on the basis of past experience.

> There is an interesting feature which becomes evident reading the DTS
> and NTP specification documents. The DTS  and NTP procedures for reading
> server times, computing bounds and selecting sources are roughly the
> same complexity, although NTP fiddles with both delay and dispersion.
> In addition, the procedures DTS uses to adjust the local clock, compute
> the correction interval and determine the next update time have roughly
> the same complexity as the NTP local-clock procedure.

                                                      These quantities are
   included in the message exchanges and form the basis of the likelihood
   calculations. Since they always increase from the root, they can be used
   to calculate accuracy and reliability estimates, as well as to manage
   the subnet topology to reduce errors and resist destructive timing
   loops.

While you state the synchronizing distance and synchronizing dispersion
can be used to calculate accuracy, I have never seen a derivation of how
this could be done. This is one of the recurring points, the lack of
formal proofs.

> Formal proofs are hard to come by, unless you make drastic assumptions
> on the statistical models and distributions operative in the Internet.
> Certainly, by the same sort of analysis presented in the DTS spec, the
> notion of correct UTC time (for a truechimer) belonging to the interval
> defined as the offset estimate +-1/2 the delay estimate is valid, as
> long as the frequency estimate is within the stated tolerance. However,
> DTS and NTP differ substantially in the philosophy of the selection
> algorithm, as explained in the text. Since your comments did not speak
> directly to this issue and I suspect you remain unconvinced, a full
> examination should await another time and place.

> For an in-depth analysis of probabilistic models appropriate for
> "well-behaved" timekeeping systems, see Appendix F of the latest spec
> revision mentioned previously. You still might not like the results of
> the analysis, since statistical models seldom give nondeterministic
> results. I think you might not argue with a conclusion that accuracy
> degrades with increasing synchronization distance and dispersion, but
> might argue over the function that maps these numbers into acceptable
> error bounds. For justification, see [MIL90a].
    
   In NTP the selection algorithm determines one or a number of
   synchronization candidates based on empirical rules and maximum-
   likelihood techniques. A combining algorithm determines the local-clock
   adjustment using a weighted-average procedure in which the weights are
   determined by offset sample dispersion.

< (portion deleted)
    
   The next step is designed to detect falsetickers or other conditions
   which might result in gross errors. The pruned and truncated candidate
   list is re-sorted in the order first by stratum and then by total
   synchronizing distance to the root; that is, in order of decreasing
   likelihood. A similar procedure is also used in Marzullo's MM algorithm
   [MAR85]. Next, each entry is inspected in turn and a weighted error
   estimate computed relative to the remaining entries on the list. The
   entry with maximum estimated error is discarded and the process repeats.
   The procedure terminates when the estimated error of each entry
   remaining on the list is less than a quantity depending on the intrinsic
   precisions of the local clocks involved.

A point which is not discussed here is that when NTP chooses to prune
an entry, it can not determine if this entry's problem is that it
comes from a bad clock (falseticker in your jargon), or experienced
unusually large and asymmetric network delays. The latter case is
something to be expected in normal operation, the former represents a
problem which should be fixed. DTS uses the interval information to
identify such bad clocks, and reports them. Since if a clocks interval
doesn't intersect the majority it is clearly faulty. This is, of course,
a MAJOR issue in distributed system management.

> NTP can determine whether a peer or radio has not responded for a "long"
> time or whether the problem is excessive dispersion. NTP implementations
> do keep track of both and report when a peer or radio becomes selected
> or deselected, reachable or unreachable and so forth. After watching peers
> and radios of various manufacture continuously for several years and
> experiencing what could be considered most bizarre behavior on occasion,
> I have concluded there is no way to reliably distinguish a falseticker
> from simple excessive delay or propagation variance on other than a
> a probabilistic basis. I claim this even after admitting the fuzzball
> timecode receiver drivers have an incredible array of consistency
> checking and monitoring machinery which can and often does detect a
> misbehaving peer or radio. I also conclude that radio design can
> be vastly improved by providing detail signal-quality information in
> the timecode itself. At one time the fuzzballs carefully and
> exasperatingly logged and reported every little thing, like when a
> peer or radio became unreachable or experienced excessive dispersion,
> etc., but now these events are logged at the server and available only
> if the remote monitoring program requests them. 
    
   The fundamental assumption upon which the DTS is founded is Marzullo's
   proof that a set of M clocks synchronized by the above algorithm, where
   no more than j clocks are faulty, delivers an interval including UTC.
   The algorithm is simple, both to express and to implement, and involves
   only one sorting step instead of two as in NTP. However, consider the
   following scenario with M = 3, j = 1 and containing three intervals A, B
   and C:
    
                    A  +--------------------------+
                    B  +----+
                    C                        +----+
    
               Result  +-----================-----+
    
   Using the algorithm described in the DTS functional specification, both
   the lower and upper endpoints of interval A are in M-j = 2 intervals,
   thus the resulting interval is coincident with A. However, there remains
   the interval marked "=" which contains points not contained in at least
   two other intervals. The DTS document mentions this interesting fact,
   but makes a quite reasonable choice to avoid multiple intervals in favor
   of a single one, even if that does in principle violate the correctness
   assumptions.

Come on, this in no way violates the correctness assumption. The
proofs tell us that the correct time is somewhere in the two dashed
sub-intervals. By making the statement that the time is somewhere in the
larger interval, a server is making a WEAKER assertion. Marzullo's proof
would apply and the algorithm would work (sub-optimally) if servers
arbitrarily lengthened the intervals they computed.

> Zounds, you have cut me to the quick. My conclusion was based on my
> reading of the text in Section 3.3 of the DTS spec and the stated
> algorithm, which seemed at first reading to me at variance with
> Marzullo's principles presented in the CACM paper. In your algorithm
> you arrange the endpoints in a list in order of indicated times, with
> lower bounds preceding upper bounds of the same value. For M-j = 2
> and the above figure, the algorithm will start at the lower limits of
> A and B and work upward, then start at the upper limits of A and C and
> work down. The first step will conclude the lower limit as the lower
> limit of intervals A and B and the upper limit as the upper limit of
> intervals A and C. Your correctness assumption uses "the smallest
> single interval containing all points in at lease M-f of the intervals,
> which is exactly what your algorithm computes. I can restate that by
> saying you require at least one clock interval to include UTC, not that
> each of the M-j = 2 clocks agrees to the same interval. As I recall,
> Marzullo's paper did not consider this case, but it is a natural
> extension. I conclude my claim is unfounded and it will disappear in the
> rewrite.

> (portion deleted)

   In point of fact, the local clock model described in the NTP
   specification is listed as optional in the same spirit as the model
   described in the DTS functional description. As such, the local clock
   can in principle be considered implementation specific and not part of
   the formal specification.
 
This is a rather odd statement. What I read is that the local clock
model is not explicitly required by the NTP documents, but it is, in fact,
required in functioning implementations.

> The intent in the original NTP spec was to define the protocol itself,
> saving the filtering, selection, combining and local-clock algorithms
> for later specification exercises. As a pragmatic matter, nobody would
> implement NTP unless there was some guidance for these algorithms. As
> the architecture and protocol was refined, it became clear that a
> well performing system of clocks could not be achieved unless
> certain aspects of these algorithms were standardized, namely the
> parameters of the local-clock algorithm, which is at the heart of
> the stability issue. You correctly observe that the NTP spec is
> confusing in this area.

                              However, as demonstrated above, frequency
   compensation requires the local clock adjustment to be carefully
   specified and implemented. The NTP mechanism has been carefully
   analyzed, simulated, implemented and deployed in the Internet, but DTS
   has not.

I have never read a clear specification of the required quality of the
input time to NTP. However, the following argument shows that in a LAN
of typical machines, DTS can indeed provide time to NTP. The clock
resolution of most machines is between 1 and 16.7 milliseconds. Thus,
any single measurements made by NTP MUST experience this clock jitter.
NTP can achieve better overall results only by averaging many such
measurements. We have measured the 'jitter' of DTS times in LANs, it is
less than 10 milliseconds, so if DTS supplies time to NTP in a typical
LAN, the NTP will receive time similar in quality to the time it gets
from other NTP servers. In the WAN case, the jitter may be a problem,
I assume that to interoperate in the presence of WAN links may require
clock training.

If you could provide the derivation of accuracy from synchronization
distance and synchronization dispersion that you allude to in section 4.2,
this could form the basis of reliable interoperation with NTP supplying
time to DTS. Alas, I suspect such a derivation is unachievable. However,
for installations which are not concerned with the DTS guarantee, the
time provider interface could be used to import NTP time into DTS (just
like any time provider, though there would have to be a user supplied
inaccuracy, based on local experience with NTP). We intend to include a
sample time provider program to permit this.

> As I said previously, and subject to the assumptions made there, the
> NTP synchronizing distance is computed similar to the DTS inaccuracy
> interval. However, a derivation of estimated error interval from measured
> distance and dispersion is not achievable on other than a statistical
> basis, which wouldn't do you much good. However, there is a basic
> flaw to your argument in achieving interoperability with NTP. The
> NTP architecture involves a probabilistic system of mutually coupled
> oscillators controlled by what is called in traditional control theory
> a type-II phase-lock loop (PLL). A type-II loop is necessary to estimate
> frequency, as well as phase. If you accept the requirement that the
> subnet of distributed oscillators must operate plesiochronously
> (phase-locked to possibly many reference oscillators themselves slaved,
> but not phase-locked to UTC), then you are stuck with type-II loops.

> The fundamental problem with type-II loops is that they can become
> unstable and sail off into the wild blue yonder if the loop time
> constants are not maintained within specified tolerances. There is
> much machinery in the NTP local-clock model that addresses these
> issues in order to maintain stability throughout the subnet. It has
> been the experience that stability can be reliably maintained over a wide
> range of network delays, outages, etc.; however, the cost is a tighter
> specification on the dynamic characteristics of the local-clock
> algorithms. See Appendix G of the cited NTP spec revision for a
> mathematical analysis of the NTP PLL. Note that RFC-1119 contains
> minor errors in some of the implementation formulas.

> I would in fact be possible to "take time" from a DTS server and splice
> it into NTP, in spite of the probably large phase noise; however, it
> would probably not be possible to integrate a DTS subnet into an NTP
> system where DTS was used for time transfer between one NTP subnet and
> another.

> (portion deleted)

   It is an uncontested fact that computer systems can be badly disrupted
   should apparent time appear to warp (jump) backwards, rather than always
   tick forward as our universe requires. Both NTP and DTS take explicit
   precautions to avoid the local clock running backwards or large warps
   when running forwards. However, both NTP and DTS models recognize that
   there are some extreme conditions in which it is better to warp
   backwards or forwards, rather than allow the adjustment procedure to
   continue for an outrageously long time. The local clock is warped if the
   correction exceeds an implementation constant, +-128 milliseconds for
   NTP and ten minutes for DTS. The large difference between the NTP and
   DTS values is attributed to the accuracy models assumed.

I believe the difference also comes from different assumptions of the
risks (and probabilistic costs) involved in jumping the clock. We assume
it is something you want to do rarely.

> The NTP experience is that, with a +-128-ms window and the Internet
> peers I watch, I have not observed a jump any time over the last couple
> of years, except upon reboot or upon insertion of the latest leap
> second, when a couple of silly implementation bugs were found. Some
> users have found it necessary to upsize the window on combined
> satellite/landline paths and on paths frequently experiencing severe
> network congestion. In fact, we have used up to +-512 ms on some paths
> to Europe and would be glad to use larger ones should that become
> necessary. I think this is a non-issue with respect to comparing
> the NTP and DTS models. 
    
   For most servers and transmission paths in the Internet a offset spike
   (following filtering, selection and combining operations) over +-128
   milliseconds following filtering, selection and combining operations is
   so rare as to be almost negligible.
 
The duplicated text makes me think there is something wrong here, though
frankly I don't understand what this paragraph is trying to say.

> Probably awkwardly stated, what I'm trying to say is that the combining
> and local-clock algorithms have the effect of reducing apparent errors
> following the clock filter by a substantial amount over the "few
> tens of milliseconds" assumed by conventional wisdom. See [MIL90a].

> (portion deleted)
    
   The service objectives of both NTP and DTS are substantially the same:
   to deliver correct, accurate, stable and reliable time throughout the
   synchronization subnet. However, as demonstrated in this document, these
   objectives are not all simultaneously achievable. For instance, in a
   system of real clocks some may be correct according to an established
   and trusted criterion (truechimers) and some may not (falsetickers).
   the models used by NTP and DTS the distinction between these two groups
   is made on the basis of different clustering techniques, neither of
   which is statistically infallible. A succinct way of putting it might be
   to say that NTP attempts to deliver the most accurate, stable and
   reliable time according to statistical principles, while DTS attempts to
   deliver validated time according to correctness principles, but possibly
   at the expense of accuracy and stability.

I would claim you're understating DTS's goals of autoconfigurability
and manageability.

> I would be glad to elevate the consciousness of this issue in the
> rewrite.
    
   In both the NTP or DTS models the problem is to determine which subset
   of possibly many clocks represents the truechimers and which do not. An
   interesting observation about both NTP and DTS is that neither attempts
   to assess the relative importance of misses (mislabelling a truechimer
   as a falseticker) relative to false alarms (mislabelling a falseticker
   as a truechimer). In signal detection analysis this is established by
   the likelihood ratio, with high ratios favoring misses over false
   alarms. In retrospect, it could be said that NTP assumes a somewhat
   lower likelihood ratio than does DTS.

I'm not sure I understand your jargon here. The important trade off
for DTS is to notify managers of broken clocks (calling a falseticker
a falseticker) so that it can be fixed. Declaring a good clock bad
(labeling a truechimer a falseticker) could only occur in DTS as an
implementation error or as a massive multi-server failure. In either
case a human will have to get involved.   

> Likelihood ratio is a tool of mathematics and estimation theory and
> is frequently used in statistical signal transmission and detection.
> The likelihood of an event can be computed from the probability
> model and assumptions about the underlying events of that model.
> For example, there are four possible outcomes of a probabilistic
> hypothesis that purports to reveal the results of an experiment:
> (1) you said it hit and it really hit, (2) you said it missed and it
> really missed, (3) you said it hit, but it really missed and (4) you said
> it missed, but it really hit. Now, a complete probabilistic analysis
> would require you place weights on each of these possible outcomes,
> from which you can determine the overall success of your hypothesis
> construction technique. This is where the likelihood ratio comes in.

   It might be concluded from the discourse in this document that, if the
   service objective is the highest accuracy and precision, then the
   protocol of choice is NTP; however, if the objective is correctness,
   then the protocol of choice is DTS. However, the discussion in Section
   4.2 casts some doubt either on this claim, the DTS functional
   specification or this investigator's interpretation of it.

I believe you are doing your position a disservice by raising this
red-herring. No one has found your argument that DTS violates the
assumptions of Marzullo's thesis convincing. Lamport commented that
it indicates a serious misunderstanding of Marzullo's proof.

> The last sentence should be struck and tell Leslie I said "hi."

                                                              It is
   certainly true that DTS is "simple" and NTP is "complex," but these are
   relative terms and the complexity of NTP did not result from accident.
   That even the complexity of NTP is surmountable is demonstrated by the
   fact that over 2000 NTP-synchronized servers chime each other in the
   Internet now.

The ever decreasing cost of time providers argues heavily for a simple
solution, even though it may require more time providers. It simply isn't
worth a lot of software complexity, (and maintenance cost, and management
cost) to avoid spending a few dollars to buy more providers. Further,
the philosophy of 'correctness' leads to certifiable implementation by
independent vendors.

> I continue to believe it is not constructive to "certify correctness" in
> probabilistic systems, only to exchange acceptable tolerance bounds for
> acceptable error bounds. If by "time providers" you imply each is
> associated with a radio clock, I do not think it likely that the
> cost of a radio clock will plummet to the point that every LAN can
> afford one and, even if it did, you can not trust a single radio. You
> have to have more than one of them and, preferably, no common point
> of failure between them.
    
> (portion deleted)
   
   The widespread deployment of NTP in the Internet seems to confirm that
   distributed Internet applications can expect that reliable, synchronized
   time can be maintained to within about two orders of magnitude less than
   the overall roundtrip delay to the root of the synchronization subnet.
   For most places in the Internet today that means overall network time
   can be confidently maintained to a few tens of milliseconds [MIL90a].
   While the behavior of large-scale deployment of DTS in internet
   environments is unknown, it is unlikely that it can provide comparable
   performance in its present form. With respect to the future refinement
   of DTS, should this be considered, it is inevitable that the same
   performance obstacles and implementation choices found by NTP will be
   found by DTS as well.

I disagree with this final paragraph. I think that NTP and DTS both attain
their very different goals. Our difference of opinion is in how important
the different goals are. I accept that DTS will not keep clocks quite as
tightly synchronized as NTP.  It will, however, be a product that a vendor
can confidently ship to customers who are expected to install, configure
and manage it themselves.

> We sure do have vastly different goals. Mine is a scientific one. I am
> keenly interested in the technology of synchronizing time and frequency
> to the highest degree of performance possible in the present state of
> the art. I have found it useful in my own research to promote and
> sustain an agenda to systematically refine NTP as an architecture,
> protocol and set of implementations and promote its establishment as
> an Internet Standard protocol. I also find it useful to promote, help
> run and mount experiments with a largish number of Internet hosts which
> now find NTP useful. I do not have a commercial agenda, nor do I have a
> particular interest in the standards process other than to hope whatever
> lessons learned in almost a decade of Internet timekeeping are documented
> and made available to the R&D community. You may have seen my message to
> the OSF in which I said the same thing and my hope that you guys, who
> well might own the standard of choice, thoughtfully consider the points
> I raise and think about how those features you think valuable in the
> long run might be anticipated now and perhaps added at some future time.

(remainder deleted)

Dave