Guy, We would never underestimate the need to include percentile based statistics in data presentation. The fact that our current system has been designed to be extensible enough to include new types of statistical analysis should prove that we will not be satisfied with average values. We currently provide average value of samples (along with minimum, maximum and 95 %ile) as a representation of OWD data to the casual end user and we consider it as a first step into this activity. We believe that such end users would be inclined towards seeing average values included in the data provided to them. There is no harm in providing extra information right?
Loukik,
I'm not at all sure that I agree.
There are several reasons why averages do harm:
# There is the strong, though implicit, understanding that an average
of (say) k samples is a good estimate of the mean of the underlying
distribution that the samples were drawn from. In many cases, the
underlying distribution is very heavy tailed and thus the mean may
not even exist. Thus it may be seriously misleading.
# Even if the (finite) mean does exist, it conveys little insight, in
that the minimum value (more or less propagation delay) and some
measure of the heaviness of the tail (more or less how much queueing
delay is going on) are what need to be understood. The mean (again,
if it exists) conveys neither.
# The average is very fragile with respect to a few outliers. For
example, consider two cases, in one you have 100 packets and one of
them has a ridiculously large delay of 5 seconds and in the other
you replace that one 5-second packet with a 10-second packet. The
average now goes up by 50 msec, so one unlucky packet warps the
measure.
And this is totally apart from the problem that the desire to define
an 'average' is one driver for ignoring the very-bad cases (e.g. the
losses which are sometimes just elements of that heavy tail).
Averages are *wonderful* when the underlying distribution is something
like a normal distribution.
But packet delay is almost never anything like a normal distribution.
Does this help?
-- Guy
Without convincing discussions, we would disagree with any opinion about representing packet loss as infinite delay instead of just reporting it as packet loss. A discussion on this issue was what I hoping for. Regards, Loukik. Guy T Almes wrote:Loukik, Two points should be considered: [] one of the (many) advantages of percentile-based statistics is that you can do it either among the entire data set (counting the losses as infinite delay) *or* among any subset (e.g. those with finite delays) if you have reason to believe that that subset has significance. [] note that, even apart from any debate about how to treat losses, the distribution of delays is very often heavy-tailed. Thus even to talk about means and (especially) standard deviations carries implicit assumptions about the mathematical nature of the distribution. Thus, even though the non-math-majors among us are naturally drawn to averages etc (the statistics we've been taught since third grade or so), we should understand that in the land of heavy-tailed distributions, these are suspect. By the way, I agree with Stas's note. But I offer these two points in addition. Regards, -- Guy --On Thursday, April 01, 2004 11:38:02 +0100 Loukik Kudarimoti <loukik.kudarimoti@dante.org.uk> wrote:stanislav shalunov wrote:Loukik Kudarimoti <loukik.kudarimoti@dante.org.uk> writes:During the TPM workshop, we realized that there is a need to come to a common understanding of OWD data representation ( esp. treatment of packet loss ).For what it's worth, the OWAMP specification is written so that send times of all packets are known with fair precision to the receiver (despite the element of pseudo-randomness in the timings). Then, if a packet does not arrive within a specified timeout, it is considered lost; the send timestamp of such a packet is known and reported.My concern is more towards the representation.When interpreting the results, lost packets simply have infinite delay, don't they? This makes certain statistics meaningless (such as mean delay), but if a value of an estimator becomes undefined because of the presence of a small number of infinite values, the estimator is not robust, and, therefore, should probably be avoided anyway. Percentiles in general do not suffer from this problem. (Harmonic average works fine with infinite numbers, too, if one wanted to insist on using non-robust---but more robust than mean---averaging mechanisms.)If ten packets were sent between times t1 and t2 and 1 was *lost* (referred to as infinite delay), we report that *1 packet between times t1 and t2 was lost* and 9 packets have an average (arithmetic mean ) value v1, min value v2, max value v3 and 95 %ile v4 (extensible to include other types of aggregations as well). A full report ( with no aggregations ) can also be provided. Now in such a report, whether we show packet loss as infinite delay or report it as packet loss still needs to be discussed. Regards, Loukik.-- * * Loukik Kudarimoti * * Network Engineer * Francis House, 112 Hills Road * Cambridge CB2 1PQ, United Kingdom D A N T E WWW: http://www.dante.net