[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [nm-wg] comments on draft-ggf-nmwg-hierarchy-02



Les,
Excellent. Thanks for the constructive reply,
-- Guy

--On Friday, March 19, 2004 19:08:31 -0800 "Cottrell, Les" <cottrell@slac.stanford.edu> wrote:

Thanks for your careful comments and the time you devoted to reading the
document.

Section 8.1: One-way Delay
The issues are not issues with the RFC 2679 rather issues, or as you put
it, observations to do with how one implements the measurements etc. As
you say there is no intent to depart in substance from the RFCs. When we
come to the next revision I will make this more clear using some of the
suggestions you have made.

Section 8.1.1: Jitter.
We should indeed change the reference to a more persistent one. It would
appear that RFC 3393 would meet that need.

Section 8.2: Roundtrip Delay
See comments for section 8.1

Section 8.3: Issues in Measuring Delay
I can understand your point about the next to last para not really being
about making singleton delay measurements. As you say perhaps a section
on measurement interpretation would make a clearer separation.

-----Original Message-----
From: Guy T Almes [mailto:almes@internet2.edu]
Sent: Friday, March 19, 2004 11:49 AM
To: nm-wg@gridforum.org
Cc: Guy T Almes
Subject: [nm-wg] comments on draft-ggf-nmwg-hierarchy-02

I very much appreciate the work and insights in this document.  I do,
though, have several comments.  These all concern delay.         -- Guy

[Section 8: Delay Characteristics]
Near the bottom of page 20, the text notes that "The above raises several
issues including:" and lists three points. The nature of the three
'issues' seems neither clear nor consistent. The first seems to be an
observation that it's hard to measure one-way delay without synchronizing
time.  Maybe this is really just an observation.  If so, it's exactly
right: doing a good job of measuring one-way delay usually requires lots
of work to succeed at achieving good synchronization.  Is the 'issue'
more than that observation? The second seems to perhaps be seeking
clarification on how one-way delay is defined when the packet is
fragmented.  If this is a claim that RFC 2679 is incomplete on this
point, then that would be good to state.  Otherwise, it's not clear what
the 'issue' is. The third seems similar to the first -- achieving
accuracy in measurements is hard work, and characterizing the accuracy of
a measurement is subtle.  Is that what is being said?
In short, this portion of the document is vague and may or may not be
asking questions about the accuracy or completeness of RFC 2679.  Both
the intent and the substance of the section should be (dropped or)
clarified.

[Section 8.1: One-way Delay]
Is this paragraph taken to be the definition of one-way delay in the
context of the hierarchy? In conversation with one of the authors, I get
the impression that there is no intent to depart in substance from the
metric defined in RFC 2679.  If so, it would be useful to add a simple
statement saying so.  Lots of work went into 2679, and it would be useful
to build on that work.  Perhaps a more carefully crafted statement would
note that the Hierarchy Characteristic is the same as that described in
2679, that good Hierarchy Methodologies might result from following
points made in 2679, and that an Hierarchy Observation of one instance of
one-way delay is consistent with an instance of a 2679 Singleton.  It
might further be noted that the current Hierarchy document does not
attempt to address issues analogous to the Sample and Statistics parts of
2679. If this is done, then those who have worked with 2679 will not be
left with the (incorrect?) impression that the Hierarchy intends to
depart from 2679 on any point of substance (though the Hierarchy document
leaves some points open that 2679 does not leave open).

[Section 8.1.1: Jitter]
You should, of course, avoid citing Internet Drafts in GGF Proposed
Recommendations (since the I-Ds are ephemeral and the PRs are much less
so).

[Section 8.2: Roundtrip Delay]
The relation of Hierarchy Roundtrip Delay to RFC 2681, parallel to that
sketched above for One-way Delay, would be useful.

[Section 8.3: Issues in Measuring Delay] The next-to-last paragraph goes
off topic and discusses an interesting issue of reporting and of
statistics of sets of delay measurements, specifically in the context of
how to deal with delay Observations that cannot be completed since the
launched packet never arrives.  My first point is that this is off-topic
(since it does not relate to "issues in measuring") and should perhaps be
placed under a separate section called something like "interpretation of
sets of Observations". The cited RFCs take what might be characterized as
a cautious -- perhaps over-cautious -- view and should perhaps be
critiqued from time to time.  The NM-WG should consider that these
cautious positions were not lightly taken, of course. Specifically, as
regards isolated instances of One-way Delay Observations (what 2679 calls
Singletons), it would seem to be useful, and certainly not harmful, to
retain the idea that attempts to launch a one-way delay Observation that
do not complete due to packet loss can be characterized as having
Infinite delay.  How that infinite delay is interpreted is a separate
matter.  In some cases, it might make sense to interpret Infinite as
'inconclusive' (in all good humor, I append a lame joke peculiar to my
native land in which a similar interpretation is made).  We indeed have
much to learn about about the phenomena of delay and its impacts on
applications.  Three points summarize part of what the cautious view is
taken in 2679: <> with respect to the causes of delay and/or loss, it is
noted that, in many cases, packet loss is caused by extreme instances of
queueing delay, as when a packet makes it to a congested router and is
then dropped due to tail-drop or RED.  And these are the losses of most
interest to TCP dynamics.  But, of course, packets are also lost due to
bit error rates or routing instabilities. <> with respect to impact on
applications, packet loss's impact is indistinguishable from that of a
very large loss.  Thus, for example, many UDP streaming media
applications posit some threshold of delay and, once this delay is
encountered, threat the delayed packet as lost even if it arrives an
augenblick later. <> it is important to avoid saying that a network that
delivers a packet with large delay performed better than one that lost
that same packet.  Otherwise, strange confusions can result.
The real problems emerge when one tries to measure a bunch of instances
of one-way delay and summarize network performance with a statistic (what
2679 treats in its Statistics of Samples of one-way delay Singletons).
If one launches 100 one-way delay tests and accurately measures one-way
delay for 95 of them and the other 5 are lost, one must be cautious about
how one treats those five lost packets.  2679 shows how to use
percentile-based statistics to define, for instance, the minimum, the
median, and other percentiles of delay.  A fine point is that the 97th
percentile of delay in the example given when there is 5% packet loss is
itself regarded as Infinite.  Those interested are encouraged to read the
RFC.  I'll make three points here: <> following 2679 in its treatment of
Statistics is clear, if a bit cautious. <> it might also be possible to
do some extra work and define percentiles of those packets with finite
delay.  For example, if you convince yourself that loss is *not* due to
congestion/queueing, then this might make sense.  (For example, if you
have determined that those five packets were (almost) certainly caused by
bit-error rates or by routing instabilities.)  In this case, one could
imagine a carefully crafted notion of cooked delay percentiles.  If work
is done in this area, by all means take it (also) back to the IETF IPPM
WG. <> but please do not talk about averages and standard deviations and
such, since this mathematical framework provides no help in dealing with
packet loss.  Moreover, even apart from packet loss, distributions of
delay are often heavy-tailed and in these cases (again, even apart from
issues of loss), means and standard deviations are fragile notions.

I would add, by the way, that our community is likely to learn a great
deal about the phenomena of networks from accurate measurement of one-way
delay and from careful interpretation of these phenomena.

Ah, the promised lame joke.  A researcher at the local agricultural and
mechanical college is presenting to a colleague on an experiment
concerning economizing on cattle feed.  The researcher explained that he
started with 100% cattle feed one week.  Then, on the second week, he fed
the subject cow with 90% feed and 10% sawdust.  The animal looked fine
and seemed healthy.  Then, on the third week, he fed it 80% feed and 20%
sawdust.  Again, the animal was OK and the researcher appeared on the way
to a breakthrough in helping ranchers save money on cattle feed.  At this
point, the colleague interrupted and asked what the eventual outcome was.
The researcher replied that, in the end, the experiment was inconclusive.
What do you mean "inconclusive?", asked his colleague.  "Well," the
researcher explained, "just as the experiment was nearing its completion,
the fool cow just up and died."