VBSpam Methodology

How the VBSpam comparative tests are carried out

This document describes the methodology used for Virus Bulletin's anti-spam comparative tests.

Participation

VBSpam is a twice-monthly comparative test for spam filters. The test report is made available free of charge and does not require registration or subscription.

Once a product has been accepted for testing, it may not be withdrawn from the review without 21 days notice from the vendor - full details are provided in the vendor test agreement (please contact vbtest@virusbulletin.com for a copy of the test agreement).

Products must be submitted in a form in which they are available to the end-user. Developers will be permitted to make changes (should they be required) in order for their product to function in the testing environment - but in all instances the VB test team must be made fully aware of such changes.

It is the developers' responsibility to inform the VB test team of any set-up or configuration changes that need to be made in order for the product to work in the testing environment.

If you are interested in submitting your product for future tests, please contact vbtest@virusbulletin.com for more details.

Test environment

Each product will receive all emails from the same fixed IP address; as a result, filtering that is based on the connecting IP address cannot be used. However, products may perform IP or domain-based filtering using the information in the Received headers that will be added to the emails to reflect the fact that the email has passed through the test network's MTA. If requested by developers, the IP address can also be added to the emails in an X-Forwarded-For header. There is also the possibility of providing a filter with this information during the SMTP transaction, thus enabling pre-DATA filtering.

More on pre-DATA filtering

Because products in the VBSpam tests are tested in parallel, they will see all email arrive from the same fixed IP address and with the same EHLO domain. It is thus not possible (and certainly not advisable) to use these for filtering.

Because the MTA that relays all emails to all products adds a Received header, many products use the content of this header. It is also possible to have the original IP address added in an extra header (e.g. X-Original-IP).

A third option is for the MTA to send an extra SMTP command, right after identifying itself using EHLO. This command will contain the sender's IP address, the corresponding reverse DNS domain and the HELO/EHLO domain used. An example of such a command is XCLIENT, which is an extension for the Postfix MTA. While products won't gain any more information, they are encouraged to use XCLIENT (or something equivalent), to emulate a real situation, where most of the spam is blocked before the DATA command has been sent.

As every email is relayed from a fixed IP address, greylisting should be turned off. If an email cannot be delivered, up to five redelivery attempts will be made with 20-minute intervals, but this can be disabled on a per-product basis if required. However, unlike in previous tests this does not include 5xx responses during the SMTP connection; these are considered to mark the email as spam and emails will not be sent.

Each product, as far as it contains or works together with an MTA, is required to relay the filtered emails to a back-end MTA; the classification should be mentioned in the header in a clear and unambiguous way. Email that has not reached the back-end MTA one hour after its original delivery will be considered to have been marked as spam.

Filters hosted locally should use at least two DNS servers.

The VB test team should be provided with a technical contact with both a good knowledge of the product and an understanding of the testing environment.

Classifying

All products are required to classify each email into one of two categories: 'ham' or 'spam'. If a filter uses other classifications (e.g. 'phishing', 'virus', 'possible spam'), the test team must be informed as to whether these classifications are to be considered ham or spam.

Products are not required to check emails for malware, but spam emails containing malware should be marked as spam. Should a ham message contain a malicious attachment, it will be removed from the test set (thus a product will not be penalized for blocking malware).

Filters will not be trained.

Filters are not permitted to use any ad hoc rules based on properties of the mail streams used in our test, e.g. automatically blocking all email in certain languages or alphabets, or whitelisting certain senders. The testers will perform regular tests to make sure this is not happening.

Email corpus

The email corpus consists of the traffic sent to a number of legitimate mailing lists, as well as spam corpuses provided by Project Honey Pot and Abusix, which are randomly assigned to an address on the vbspamtest.com domain. All emails are forwarded in real time. Most of the mailing list emails have their headers rewritten so that they appear to have been sent from the original sender directly to Virus Bulletin.

More on the VBSpam ham corpus

It is impossible to test a spam filter's capability of blocking spam without measuring at the same time what percentage of legitimate email is blocked. To this end, a good test requires a ham corpus of a decent size and good, representative quality.

In previous tests, the legitimate email sent to @virusbtn.com addresses was used as a ham corpus. However, while this corpus is very real, it has the downside that the emails in it can not be shared with participants (for reasons of privacy, among others). Because being able to verify false positives and use them to improve filters is an important part of the test, some changes have been made that enable us to use a ham corpus that we can share in full.

The main part of the ham corpus consists of emails sent to public mailing lists. These emails are sent by the sender to a list server which makes some minor modifications to the subject and the contents, adds several headers and relays the email to each subscriber. In our test, we remove the headers added by the list server and make it appear as if the email was sent directly to the virusbtn.com domain; this includes using the original values for HELO/EHLO, the sender's IP address and its reverse DNS in the Received header and MAIL FROM in the SMTP envelope. Tests have confirmed that this gives a good ham corpus that products can filter without any major problems.

It should be noted that the subject and the body are not changed, nor are the pre-existing headers of the email. This gives some possibilities for 'cheating', e.g. by whitelisting emails that contain certain strings in the subject. This is explicitly not allowed; checks will be made to make sure no product is doing this.

It is possible for the ham corpus to contain some mailing lists that have not been modified and whose sender is the list server. A small number of legitimate newsletters will also be added to the ham corpus.

The products will receive approximately one email every two seconds.

Each email will be left unchanged with the exception that of a Received header added as follows:

Received: from xxx.xxx.xxx (HELO yyy.yyy.yyy [12.34.56.78]) by mx.vbspamtest.com (qpsmtpd/0.81) with [E]SMTP id dddddddddddddddddddd.ppp; Tue, 12 May 2009 18:20:24 +0100

where the date is the current date and time, 12.34.56.78 is the IP address of the sending MTA, yyy.yyy.yyy is the domain in the HELO/EHLO command, and xxx.xxx.xxx is its reverse DNS record (defaults to Unknown if no reverse DNS record is found). The string dddddddddddddddddddd.ppp is used by the back-end MTA to uniquely identify both the email and the product.

The HELO/EHLO domain of the SMTP envelope will be changed to reflect delivery from a local MTA; the MAIL FROM and RCPT TO values will be the same as those of the original message.

Awards

VBSpam verified

On completion of the test, a weighted false positive (WFP) rate and average spam catch (SC) rate (#spam caught/#spam) will be computed for each product.

The WFP rate is defined as the false positive rate of the ham and newsletter corpora taken together, with emails from the latter corpus having a weight of 0.2:

WFP rate = (#false positives + 0.2 * min(#newsletter false positives , 0.2 * #newsletters)) / (#ham + 0.2 * #newsletters)

Products whose spam catch rate minus five times the weighted false positive rate (SC - [5xWFP]) is greater than or equal to 98 will earn a VBSpam award:

(SC - [5xWFP]) ≥ 98

VBSpam+

Meanwhile, the VBSpam+ logo is awarded to products that combine a spam catch rate of 99.50% or higher with zero false positives and no more than 2.5% false positives among the newsletters.

For each product, up to four false positives will be counted per sender, where two senders are considered to be the same if the value of MAIL FROM is the same or very similar and the sending IP address is on the same /24 network.

 

VBSpam

Latest Report

The latest VBSpam comparative test report

VBSpam Test Schedule

The schedule for upcoming VBSpam test reports

VBSpam Methodology

How the VBSpam comparative tests are carried out

VBSpam Test Archive

Details of all previous VBSpam comparatives

VB Testing