Why our numbers work

By Joel Snyder
Network World, 12/20/04

Original Article On Network World Web Site

You may notice our numbers are not as optimistic as the marketing literature from vendors' products. There are four reasons for this:

1. Side effects from our test bed probably shaved a few points off of each product's ability to identify spam.

2. We were very strict in our definition of false positives. Because many of the false positives are mailing-list traffic of marginal use, end users often don't count them when reporting errors. Missing a few messages a month from a list that generates 10 a day doesn't bother them. This contributes to optimistic numbers that vendors report based on user experiences.

3. Because we ran our tests on more than 10,000 messages from a real-time mail stream, our results are more representative of real product response than canned or contrived tests from vendors. Even a few hours' delay in processing mail causes significant deviations in performance of some products.

4. Most vendors choose to report false-positive rates by dividing false positives by the total messages processed. No statistician would do that. Some vendors don't explain what they mean by "false-positive rate." We used statistics rigorously defined and agreed on by researchers, and it makes a dramatic difference. In our tests, computing false-positive rates the vendor way would cut the numbers in half. For a detailed look at the statistics involved, see "What makes a false positive."