Thanks for your careful comments and the time you devoted to reading the document. Section 8.1: One-way Delay The issues are not issues with the RFC 2679 rather issues, or as you put it, observations to do with how one implements the measurements etc. As you say there is no intent to depart in substance from the RFCs. When we come to the next revision I will make this more clear using some of the suggestions you have made. Section 8.1.1: Jitter. We should indeed change the reference to a more persistent one. It would appear that RFC 3393 would meet that need. Section 8.2: Roundtrip Delay See comments for section 8.1 Section 8.3: Issues in Measuring Delay I can understand your point about the next to last para not really being about making singleton delay measurements. As you say perhaps a section on measurement interpretation would make a clearer separation. -----Original Message----- From: Guy T Almes [mailto:almes@internet2.edu] Sent: Friday, March 19, 2004 11:49 AM To: nm-wg@gridforum.org Cc: Guy T Almes Subject: [nm-wg] comments on draft-ggf-nmwg-hierarchy-02 I very much appreciate the work and insights in this document. I do, though, have several comments. These all concern delay. -- Guy [Section 8: Delay Characteristics] Near the bottom of page 20, the text notes that "The above raises several issues including:" and lists three points. The nature of the three 'issues' seems neither clear nor consistent. The first seems to be an observation that it's hard to measure one-way delay without synchronizing time. Maybe this is really just an observation. If so, it's exactly right: doing a good job of measuring one-way delay usually requires lots of work to succeed at achieving good synchronization. Is the 'issue' more than that observation? The second seems to perhaps be seeking clarification on how one-way delay is defined when the packet is fragmented. If this is a claim that RFC 2679 is incomplete on this point, then that would be good to state. Otherwise, it's not clear what the 'issue' is. The third seems similar to the first -- achieving accuracy in measurements is hard work, and characterizing the accuracy of a measurement is subtle. Is that what is being said? In short, this portion of the document is vague and may or may not be asking questions about the accuracy or completeness of RFC 2679. Both the intent and the substance of the section should be (dropped or) clarified. [Section 8.1: One-way Delay] Is this paragraph taken to be the definition of one-way delay in the context of the hierarchy? In conversation with one of the authors, I get the impression that there is no intent to depart in substance from the metric defined in RFC 2679. If so, it would be useful to add a simple statement saying so. Lots of work went into 2679, and it would be useful to build on that work. Perhaps a more carefully crafted statement would note that the Hierarchy Characteristic is the same as that described in 2679, that good Hierarchy Methodologies might result from following points made in 2679, and that an Hierarchy Observation of one instance of one-way delay is consistent with an instance of a 2679 Singleton. It might further be noted that the current Hierarchy document does not attempt to address issues analogous to the Sample and Statistics parts of 2679. If this is done, then those who have worked with 2679 will not be left with the (incorrect?) impression that the Hierarchy intends to depart from 2679 on any point of substance (though the Hierarchy document leaves some points open that 2679 does not leave open). [Section 8.1.1: Jitter] You should, of course, avoid citing Internet Drafts in GGF Proposed Recommendations (since the I-Ds are ephemeral and the PRs are much less so). [Section 8.2: Roundtrip Delay] The relation of Hierarchy Roundtrip Delay to RFC 2681, parallel to that sketched above for One-way Delay, would be useful. [Section 8.3: Issues in Measuring Delay] The next-to-last paragraph goes off topic and discusses an interesting issue of reporting and of statistics of sets of delay measurements, specifically in the context of how to deal with delay Observations that cannot be completed since the launched packet never arrives. My first point is that this is off-topic (since it does not relate to "issues in measuring") and should perhaps be placed under a separate section called something like "interpretation of sets of Observations". The cited RFCs take what might be characterized as a cautious -- perhaps over-cautious -- view and should perhaps be critiqued from time to time. The NM-WG should consider that these cautious positions were not lightly taken, of course. Specifically, as regards isolated instances of One-way Delay Observations (what 2679 calls Singletons), it would seem to be useful, and certainly not harmful, to retain the idea that attempts to launch a one-way delay Observation that do not complete due to packet loss can be characterized as having Infinite delay. How that infinite delay is interpreted is a separate matter. In some cases, it might make sense to interpret Infinite as 'inconclusive' (in all good humor, I append a lame joke peculiar to my native land in which a similar interpretation is made). We indeed have much to learn about about the phenomena of delay and its impacts on applications. Three points summarize part of what the cautious view is taken in 2679: <> with respect to the causes of delay and/or loss, it is noted that, in many cases, packet loss is caused by extreme instances of queueing delay, as when a packet makes it to a congested router and is then dropped due to tail-drop or RED. And these are the losses of most interest to TCP dynamics. But, of course, packets are also lost due to bit error rates or routing instabilities. <> with respect to impact on applications, packet loss's impact is indistinguishable from that of a very large loss. Thus, for example, many UDP streaming media applications posit some threshold of delay and, once this delay is encountered, threat the delayed packet as lost even if it arrives an augenblick later. <> it is important to avoid saying that a network that delivers a packet with large delay performed better than one that lost that same packet. Otherwise, strange confusions can result. The real problems emerge when one tries to measure a bunch of instances of one-way delay and summarize network performance with a statistic (what 2679 treats in its Statistics of Samples of one-way delay Singletons). If one launches 100 one-way delay tests and accurately measures one-way delay for 95 of them and the other 5 are lost, one must be cautious about how one treats those five lost packets. 2679 shows how to use percentile-based statistics to define, for instance, the minimum, the median, and other percentiles of delay. A fine point is that the 97th percentile of delay in the example given when there is 5% packet loss is itself regarded as Infinite. Those interested are encouraged to read the RFC. I'll make three points here: <> following 2679 in its treatment of Statistics is clear, if a bit cautious. <> it might also be possible to do some extra work and define percentiles of those packets with finite delay. For example, if you convince yourself that loss is *not* due to congestion/queueing, then this might make sense. (For example, if you have determined that those five packets were (almost) certainly caused by bit-error rates or by routing instabilities.) In this case, one could imagine a carefully crafted notion of cooked delay percentiles. If work is done in this area, by all means take it (also) back to the IETF IPPM WG. <> but please do not talk about averages and standard deviations and such, since this mathematical framework provides no help in dealing with packet loss. Moreover, even apart from packet loss, distributions of delay are often heavy-tailed and in these cases (again, even apart from issues of loss), means and standard deviations are fragile notions. I would add, by the way, that our community is likely to learn a great deal about the phenomena of networks from accurate measurement of one-way delay and from careful interpretation of these phenomena. Ah, the promised lame joke. A researcher at the local agricultural and mechanical college is presenting to a colleague on an experiment concerning economizing on cattle feed. The researcher explained that he started with 100% cattle feed one week. Then, on the second week, he fed the subject cow with 90% feed and 10% sawdust. The animal looked fine and seemed healthy. Then, on the third week, he fed it 80% feed and 20% sawdust. Again, the animal was OK and the researcher appeared on the way to a breakthrough in helping ranchers save money on cattle feed. At this point, the colleague interrupted and asked what the eventual outcome was. The researcher replied that, in the end, the experiment was inconclusive. What do you mean "inconclusive?", asked his colleague. "Well," the researcher explained, "just as the experiment was nearing its completion, the fool cow just up and died."