David Harley writes to the director of research at the SANS Institute to express his concerns about Consumer Reports' AV testing methodology.
Copyright © 2006 Virus Bulletin
Dear Mr Paller,
Thank you for letting us know that the Consumer Reports methodology for testing anti-virus software by creating new variants  is fair and rigorous.
We of the anti-virus brotherhood are always grateful for crumbs of enlightenment from the table of the Great and the Good of the security establishment far beyond the walls of our own little ivory towers. Nevertheless, we don't believe you were altogether correct in this instance.
Let's start with some admissions.
What virus scanners do best is find (and hopefully remove) known viruses. They are not so good at detecting and removing unknown viruses. The model of adding definitions to detect each virus as you find out about it has a fatal flaw: it means that the anti-virus vendor is always 'playing second' to the malware author. And yes, it is an approach that works better for the vendor's revenue stream than for the consumer's proactive protection. However, it does a worthwhile job of removing malware that has already kicked in, and of keeping many PC systems clear of the common viruses still in circulation.
But that is not what modern scanners do. At least, it's not all they do. They use a variety of proactive techniques, which means that they're capable of detecting some unknown viruses as they appear, and before they've been analysed in the vendor's lab. So when you stated in the SANS Newsbytes newsletter that anti-virus vendors don't find and block viruses quickly, you're working from a model that is many years out of date. You also seem to imply that anti-virus vendors are still updating their product every few weeks or months (as was the case in the past), whereas most vendors now update their products at least daily, and usually make detection available within hours of an in-the-wild virus being reported to them.
Of course, heuristic analysis and generic signatures don't catch 100% of unknown malware, or anything like it. In fact, since malware authors dedicate a serious amount of R&D time to patching their creations until the main anti-virus products don't detect them, anyone who thinks that up-to-date scanners can offer perfect protection needs a reality check.
That's one of the reasons why savvy security administrators use AV scanners as just one component of a multilayered defence strategy, as a supplement to other generic/proactive approaches. They use them to clean up outbreaks where proactive defences fail, and to ensure that the many malicious programs still circulating months or years after their discovery don't get a foothold on the sites under their wing. And this is why anti-virus is really a multi-functional product nowadays.
We (anti-virus vendors, independent researchers and testers, canny AV users like the members of AVIEN, and so on) already know all this, so this test isn't really 'important product improvement research', is it? But it does point to a massive failure on our part. We have tried, but failed to educate both the general public and the wider security community about what anti-virus really is, how it really works, and how important it is to use non-reactive defences and very rigorous testing practices.
It's not so surprising that we’ve failed to educate home users, when there is so much misinformation to compete with. But clearly we still can't expect a fair hearing from other sectors of security, either. And when they get it wrong, they mislead a whole load of other people.
So, is it wrong to test a scanner's ability to detect heuristically? Of course not, if it's done competently. Was this a competent test? Well, we don't really know. Only the barest bones of their methodology has been published. Since these people are working outside the AV research community – which is far more collaborative than anyone outside it will ever believe – we really don’t know whether they know any more about this specialist area than the average end user.
Back in the days when I was less easily depressed, I tracked some of the 'tests' that were circulating at that time. Testers were using collections of alleged viruses found on 'vx' websites. These were known to contain large numbers of garbage files such as random text files, snippets of source code, intendeds (viruses that couldn’t actually replicate, and therefore weren't viruses), corrupted viruses that couldn't work, programs generated by virus generators which may or may not have been viable viruses, the infamous Rosenthal utilities, and (my particular favourite) 'virus-like' programs (I've often wondered what that meant). Even then, testers were trying to test a scanner's heuristic ability by generating 'variants'. Inserting snippets of virus code at random places in a test file. Patching presumed infected files in random places. Changing text strings found in virus bodies on the assumption that that was what scanners were looking for. Concealing them in objects like Word documents where they could never be found naturally, or 13 levels down in an encrypted archive. What they didn't do, almost without exception, was make any attempt to check that what they were testing with was, in every case, a valid, working virus.
Perhaps the Consumer Reports test was better than that, though a quote from Evan Beckford suggests that virus generators may have been used – and these are notoriously unreliable when it comes to producing viable viruses. Unless more data is published on the methodology used for these tests, or the test collection is submitted for independent verification, how will we know whether the test is valid?
For all we know, the collection could consist of 5,500 garbage files. (I don't know whether it is significant that most of the files generated were not actually used.) Just think about that scenario for a moment. If this were the case, the scanners that scored the highest number of detections would be hitting high volumes of false positives. If even some of the test files were invalid, you wouldn't just be testing heuristic capability any more: you'd be testing the sensitivity of the products' heuristics, and their whole detection philosophy. Perhaps that's a valid test objective, but not the one that seems to have been intended.
All this, of course, presupposes that all the scanners tested were configured appropriately and consistently. In real life, some of the many amateur sites that run new malware against multiple scanners and publish comparative results for that malware have been known to penalize individual products by using out-of-date definitions (or signatures, if you must) or over-conservative settings. Again, we simply don't know how well this was done. I will, for the purpose of this note, assume that at the very least all the necessary precautions were taken to avoid the inadvertent release of these variants beyond the testing labs (as is claimed).
Is it wrong to create new test viruses and variants? The anti-virus industry is very leery of creating viruses for any purpose: some anti-virus researchers won't do that under any circumstances, and probably none will do so when it isn't necessary. It's ironic that half the world is convinced it is members of the AV companies that write the malware, while the industry itself obsesses about keeping its hands clean, not employing virus writers and so on.
I won't say it is never necessary to write a new variant, or replicative code for testing and development purposes: that is a decision that is best left to the individual researcher. But it is not necessary to write viruses to test anti-virus heuristics. A less contentious approach is the retrospective test, where you 'freeze' a scanner without updates for a period of time then test with a batch of malware that has appeared subsequent to the cut-off point. This needs to be done very carefully, but it avoids the ethical conflicts and many of the technical uncertainties, and it is a better test of a scanner’s capabilities than throwing at it objects that may or may not be valid viruses.
Given our previous history of disagreement over virus issues, you may be surprised to know that I still think SANS does some excellent work. However, your commentary suggests that, like so many security gurus, you may have succumbed to the inability to say 'I don't know enough about that speciality to make a useful or valid comment.' Perhaps you need to catch up with some of the literature on testing, maintaining a collection (Vesselin Bontchev's paper  is still very relevant), and so on. You might also want to look up a very old (but still too relevant for comfort) article by Alan Solomon on how to bias a comparative review , as well as Igor Muttik's very recent response to the CR test .
Robert Slade and I wrote a long chapter on testing issues in our book on viruses . (Although I was lukewarm about retrospective testing then, I've seen it work well in practice since.) You could consider checking out some of the organizations that offer competent independent testing, such as AV-Test.org, av-comparatives.org, ICSA Labs and Virus Bulletin. Have you read Peter Ször's book yet, I wonder ?
The anti-virus industry is far from perfect. But it includes some amazingly competent people, some of whom have thought long and hard about the testing issue, and work closely with equally competent independent researchers and testers. Some of them may even know more than people who don't work in or on the fringes of the industry. Just a thought.
 See Virus Bulletin September 2006, p.2.
 Bontchev, V. 1993. http://www.people.frisk-software.com/~bontchev/papers/virlib.html.