Posted by Virus Bulletin on Feb 15, 2012
Comparative test did not take false positives into account.
Researchers from Cascade Insights performed a comparative spam filtering test on the three major webmail providers and concluded that Hotmail performed best, shortly followed by Gmail, with Yahoo! a distant third.
The researchers registered accounts at all three providers and, for comparison, a fourth one at an ISP that did not perform any kind of filtering. They then seeded these accounts by sending 'replies' to known spammers, posting the addresses on public sites and submitting them to dubious-looking websites.
For the test period, the researchers manually classified all the emails they received, marking as 'ham' all legitimate emails as well as those that were solicited - even if they were scams. They then calculated the percentage of spam in each of the inboxes and found that a little less than half of the email in Hotmail's inbox was spam, a ratio almost matched by Gmail. In Yahoo!'s case, over 58 per cent of the inbox consisted of spam, compared to close to two thirds for the unfiltered account.
The researchers did not look at false positive rates. While they were right to say that the emails they classified as 'ham' (many of which were of dubious nature) weren't representative of a real stream of legitimate emails, it is a shame that they didn't find another way to send legitimate mail to the accounts - avoiding false positives is an important part of running a spam filter and a good spam catch rate is useless if many legitimate emails are blocked as well.
A possibly more serious flaw in the test was the way performance was measured. Measuring the percentage of spam in the inbox is a nice idea. After all, users of free webmail services care little about the amount of spam sent to them; rather, they care about the amount of spam that makes it into their inbox. However, in using this method, the researchers implicitly take the 'ham' in their mail streams into account - the same messages that they previously claimed could not be used for measuring false positives.