[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [nm-wg] treatment of lost packets when measuring delay



"Cottrell, Les" wrote:
> 
> You can represent loss as infinite delay if you are using percentiles (e.g. medians) to characterize the distribution. Percentiles and inter-quartile ranges (IQR) are more robust to outliers than averages/means and standard deviations and also less dependent on assumptions of the distribution type (e.g. heavy tailed vs Gaussian normal). Thus for most purposes percentiles are to be preferred. Thus one might choose to report median & IQR rather than mean & standard deviation.
> 
> Then a separate question is whether one includes lost packets as infinite delay or not. That is much less clear to me.  I prefer to treat lost packets separately and not fold them into the median and IQR.

If you were to look at percentiles without including the infinite
delays... How would you actually observe the full tail? You have an
incomplete distribution. I don't think Median and IQR are meaningful
without the full distribution. They are obviously a statistical
representation of the subset, but what value is that information?

I think we need to step back a moment here and ask why anyone would want
to look at this data. From the point of view of an application or user
that needs to receive 10 packets of information - and wants to look at
statistics to determine how long to expect it to take... It seems clear
to me that you need the information from the full distribution.

If you include the full distribution of delays, there is no need to look
at another metric (namely loss) to determine the meaning of the delay
metric. I believe we should avoid semantic dependencies at this level if
we can.

jeff

P.S.
I totally understand the desire to be able to aggregate all the values
to a simple mean. It is far easier, especially with regard to further
aggregation. I have nothing against using mean as a possible reported
statistic, as long as it is mathematically correct. (If there is a
missing value in the input - the mean is a missing value.) It is
important that the statistics that we are reporting are useful, not just
easy to report.