Virus Bulletin :: VB100 Methodology

This methodology version has been superseded by a newer version. Click here for the most recent version in effect.

Overview

Test purpose

VB100 is a certification test for endpoint security solutions. A product that earns the VB100 certification can be considered to meet a minimum standard of quality when it comes to the detection of malware.

Please be advised that the static detection testing employed by this test is not suitable for evaluating the full real-world protection potential of a product, nor is it suitable for comparative performance analysis.

Test modules

The VB100 test is made up of two major modules:

The Certification Test. This measures the performance of the product against a quasi-standard body of samples (changing from test to test) and forms the basis of certification. This module is made up of three identical test parts, exposing the product to the same set of samples over the course of several days, which enables the observation of performance over a longer period and allows recovery in case of technical issues.
The Diversity Test. This module provides complementary information to the certification by exposing the product to a much wider selection of samples recently seen by VB and third parties. This test is not included in the certification criteria.

Testing procedure

General test outline

The product is installed on a clean, dedicated Windows instance. Initial configuration is performed as per the product default (exceptions apply, see Product setup).

The product is then exposed to various malicious and clean samples, either in on-access test mode (preferred) or in on-demand test mode.

Note that neither malicious nor clean samples are executed in the VB100 test.

Full Internet access is allowed for the tested product throughout the test.

On-access test mode

This is the default test mode. The product is exposed to samples through file operations in order to trigger on-access scanning. This is a two-step process:

On-write scanning: The sample is written to the file system. Failed write operations are considered detections.
On-read scanning: As some products will not scan on write, or may limit the scan scope, all successfully written sample files are opened and read using a bespoke tool which records the success of this operation. Failed attempts to open or read files are considered detections.

Both write and read operations use the standard OS APIs.

Note that product logs are not consulted in the process. Detections are identified solely on the basis of the outcome of the open/read/write operations.

On-demand test mode

While on-access scanning is preferred, we may use on-demand scanning instead, i.e. interact with the product to get it to scan a file system folder that contains all samples. In this case, detections are identified from the product logs.

Test environment

Testing is performed on physical computers or virtual machines. These have average endpoint PC specifications similar to those one would find in a business setting. Each test is carried out on two different systems, one running MicrosoftⓇ WindowsⓇ 7 Pro N 64-bit and the other running MicrosoftⓇ WindowsⓇ 10 Pro 64-bit.

Sample selection

Sample types

The test exposes the product to samples of various file types. However, the product is expected to detect malicious Portable Executable (PE) files only. For this reason:

False negatives are only counted for malicious PE files
False positives are counted on any clean file, regardless of type

The PE files used in the test may be 32-bit or 64-bit. Executables (e.g. .exe, .scr) and dynamic link libraries (e.g. .dll, .cpl.) may both be used. Statically linked library (DLL) dependencies may not be available, but otherwise the file must be valid. Archive containers (e.g. .zip, .rar) are unpacked and only the resulting files are used.

Malicious sample sets

We use the following malicious sets in various parts of the test:

WildList set: This set contains 'in-the-wild' (ItW) samples from the WildList Organization. The WildList is an extremely well-vetted set of malware recently observed in the wild by researchers. Set size: a few thousand samples total.
AMTSO RTTL set: The Real-Time Threat List (RTTL) is a repository of malware samples collected by experts from around the world. The repository is managed by the Anti-Malware Testing Standards Organization (AMTSO). This test uses a continuous feed of new samples. Set size: 1,200–1,300 samples on average (per day of use).
The Diversity set: This set contains various recently seen samples of malware collected by VB and received from third parties. Set size: 1,000–2,000 samples on average (per day of use).

Clean sample set

The Clean set is a set of clean files collected by VB, consisting of over 400,000 files, all harvested from popular software downloads available on the Internet. The set is regularly maintained with new additions and the purging of old files.

Samples selected for testing

For the Certification Test module:

The whole of the WildList set is used as the source of malicious samples. In case an issue prevents use of the WildList, the AMTSO RTTL set may be used as fallback.
A subset of the Clean set is used. Approximately 100,000 files are selected at random from the Clean set, of which at least 25% are guaranteed to be PE files.

All three repetitions of the test use the same samples, as compiled prior to the test.

For the Diversity Test module:

Approximately 2,000 samples are selected at random from the Diversity set to conduct the test. Note that, due to post-test validation of samples, the Diversity Test results will be based on a subset of these samples.

Test stages

The VB100 test consists of the following stages:

Product setup
Test setup
Data collection
- Certification Test
- Diversity Test
Sample post-validation
Feedback and disputes
Publication

Product setup

The products are set up in the test environment when they are first submitted to the test. Installation is performed on a clean Windows image. The product is configured as per its default installation settings (exceptions apply).

An image of the system is captured for use in subsequent tests. The image is maintained in the lab throughout the participation of the product in the test.

Test setup

The test setup is the first step in each test cycle:

The latest versions of the WildList and the AMTSO RTTL are retrieved.
A subset of the Clean set is selected, as described in 'Samples selected for testing'.

Data collection

Prior to starting any test part (that is, any one of the three Certification Test parts, or the Diversity Test), the following procedure is performed:

An instance of the product is obtained, either by restoring an image or by using an active one.
A reasonable effort is made to confirm that the product has completed its own updates successfully before proceeding with the test.
Basic operational health is checked by scanning an instance of the EICAR Standard Anti-Virus Test File.

In the Certification Test, this is followed by the execution of the test.

In the Diversity Test, samples are selected as described in 'Samples selected for testing', followed by the execution of the test.

Sample post-validation

All purportedly malicious samples used in the test are validated for maliciousness using a 'multi-scanner' approach. This involves scanning each sample using multiple anti-virus engines and using the classification consensus to determine if the sample is malicious or clean. Samples that fail to demonstrate their belonging to their respective sample set are examined manually. If necessary, the samples are discarded and the test results are adjusted as if these samples were never part of the test.

For the Diversity Test, samples are further filtered by randomly picking at most 1,000 validated samples and using only these to compile the test results.

Feedback and disputes

The participant receives preliminary feedback on the performance of the product(s) tested on their behalf. This marks the beginning of the dispute period, during which the participant can examine the results and dispute them, if necessary.

Feedback includes at least the following, per product, for the Certification Test:

List of samples classified as false positives, by hash, per operating system.
List of samples classified as false negatives (i.e. misses), by hash, per operating system. In the case of an excessive number of false negatives, the number of samples shared may be capped.
A dispute period of at least 5 business days (as per UK business hours) starting from the issuing of initial feedback. (In some cases the dispute time may be extended at VB's discretion.)

Upon request, VB may also provide:

Log and other data files generated by the tested product.
Version numbers before and after the tests.
False positive or false negative files.

Certification criteria

The VB100 certification is issued to a product if it meets the following conditions:

It misses no more than 0.5% of malicious samples (false negatives)
It mistakes no more than 0.01% of the clean samples as malicious (false positives)

False negatives and false positives are counted once per sample, regardless of in how many test parts and on which platforms they occur.

Please be advised that the granting of the certification, or the failure to attain one, should be interpreted along with the purpose of the test, as described at the beginning of this methodology.

Policies

We strive to be fair and to provide equal conditions to all tested products and their vendors.

Policies by topic

Product configuration

The tested product is configured as per its default installation settings, with the following exceptions:
- Logging may be configured to allow the gathering of sufficient information for test result analysis.
- Detection response may be configured in a way that facilitates automated testing (e.g. automatic blocking instead of prompting on malware detection; disabling of disinfection or quarantining, etc.).
- For a product tested in on-demand mode, on-access scanning is disabled (where possible).
- Instructions provided by the vendor to comply with the test environment are applied.

On-access and on-demand test modes

A product that does not support on-access detection of some or all samples may opt to be tested in on-demand test mode. Data acquired using either method is considered to be of equal grade.
A vendor may request to switch between on-demand and on-access test modes at any time. We encourage vendors to provide us with classification mapping along with their request. The change is to be carried out within 5 business days from the request being received and prerequisites (such as classification mapping, log processing tools, etc.) being met. Changes are applied to all subsequent test parts following receipt of the request, but not to ongoing parts. Virus Bulletin reserves the right to decline the request.

Tested program builds

Public VB100 tests aim to test product builds that are available to general audiences. Acknowledging that this is not always possible (e.g. in the case of a technical issue that needs immediate addressing, or if specific changes are required in the product in order to be compatible with our test), we accept custom product builds at our discretion. The use of a custom build will be documented in the published test report.
During a test, we allow the product to update with any updates delivered through automatic channels, but, except as needed for troubleshooting technical issues, we do not accept manual submissions of update packages.

Identification of hits on samples

Security products often classify threats into various classes, including grey areas like Potentially Unwanted Applications (PUAs). However, the VB100 test relies on binary classification – there is either a hit on a sample, or there is no hit. For this reason:
- The vendor of a product tested on demand is urged to declare which classifications are to be treated as 'hits' for the purpose of the test, and how these may be identified from the product logs.
- For a product tested on access, should the product require non-default configuration in order to block all samples deemed by the vendor as suitable to be counted as 'hits', the vendor is advised to provide VB with configuration instructions.
Should VB not receive instructions on the above, the test team will endeavour to determine the answers to the above questions themselves.

Sufficient data

The test is considered successful if at least two Certification Test parts are successfully completed. A non-default number of test parts will be documented in the test report.
In the Diversity Test, at least 750 samples are required to demonstrate the accuracy of the tested product. If the sample count falls below this level after post-test validation and disputes, results for the Diversity Test will not be published in the test report. In such event, a brief explanation will be provided in the published test report.

Withdrawal from the test

A product that commits to the test may not be withdrawn once the testing starts, unless technical problems prevent successful completion of testing.

Technical issue resolution

Should the product experience significant technical issues that affect test performance:
- VB may invalidate the affected test part.
- VB may re-run invalidated test part(s), up to three re-runs total per test.
- Should the test still not succeed, the product may be withdrawn from the test.

Disputes

Disputes are evaluated on a case-by-case basis. The participant is asked to provide supporting data or evidence, if any, along with their dispute. Although all efforts will be made to resolve disputed issues to the satisfaction of all parties, VB reserves the right to make the final decision.

To reflect the broad nature of real-life issues, the scope of the disputes is not limited. The following are a few examples:

False negatives (e.g. on the basis of a sample being corrupted, greyware, etc.)
False positives (e.g. corrupted files, greyware, PUAs, etc.)
General performance issues (e.g. the product did not function as expected during the test)

Samples that are successfully disputed are placed on an exclusion list. This list is consulted at certain points during testing, up to the end of the feedback and dispute period.

Version history

Version 1.1

Certification criteria update
New test module 'Diversity Test'
Obsolete RAP (Reactive/Proactive) Test removed
Running the test inside virtual machines enabled
Clean file set usage in the test updated
Updates on how false positive and false negative events are counted
Updates on what file types are considered false positives and false negatives
New or updated policies on:
- Acceptance of custom product builds in the test
- Acceptable product updates during testing
- Prerequisites of on-demand and on-access test modes
- Switching between on-demand and on-access test modes
- Sufficient data for certification
Minor wording changes

Version 1.0: First published in this format.

VB100 Methodology - ver.1.1