Virus Bulletin :: VB100 Methodology

Overview

VB100 is a certification test for Windows endpoint security solutions.

Certification objective

VB100 seeks to establish whether the static detection layer of the tested product is capable of detecting common, file-based Windows threats, without generating an excessive number of false alarms for legitimate programs.

Test process outline

The test exposes the product to both malicious and legitimate program samples, and records the tested product’s response to these samples.

Exposure to samples begins with downloading and saving each sample to the local file system in the presence of the product. For a subset of executable samples, a process launch is initiated. Next, a scan-on-demand is performed by the product on the remaining samples. Finally, any residual samples are inventoried, and their integrity is verified.

Products are expected to demonstrate their protection capabilities by their intervention (or the lack thereof) in this process. Any samples that make it to the end of this process without any errors or changes are considered “not detected”, whereas all other samples are considered as “detected”.

Test case sets

Test case sets are compiled frequently to include freshly acquired samples. We aim to base each new test for the product on the most recent test set available. The same test sets may be used for multiple products if they are tested close together in time.

Test case sets are composed of two subsets:

Certification Set: Assorted Windows malware recently observed in the wild. Set size varies, generally including 1,000 - 2,000 purely Windows PE-type cases. Samples are curated by Virus Bulletin from a variety of sources, predominantly our own threat intelligence (at least 50%) and several third-party public, private and commercial sources, to increase the sample type diversity. We seek to select threats that represent (or recently represented) a clear risk to the consumer, and to exclude potentially unwanted applications, game hacks, cryptominers and other ambiguous cases.
Clean Set: Widely and less widely used legitimate program files. This set contains 100,000 samples, selected randomly from a much larger repository of samples. This set includes both PE (at least 25%) and miscellaneous file types. All samples are curated by Virus Bulletin.

Test results compilation

After completing the test process, the product’s responses are referenced against the respective body of the test cases and are sorted into the following categories:

True positive = malicious test case identified as malicious (malware detected)
True negative = legitimate test case identified as legitimate (legitimate treated as such)
False positive = legitimate test case identified as malicious (false alarm)
False negative = malicious test case identified as legitimate (malware missed)

Rates for each set are determined and verified against the test criteria. A test report is published (regardless of whether the test criteria are met) and the certified state of the product is reviewed.

Criteria

Test criteria

A product is considered to have passed a test if the product receives any grade better than Grade F.

Grades are determined based primarily on the true positive rates of the Certification Set.

Grade	True positive rate requirement
Grade A+	>= 99.5%
Grade A	>= 97%
Grade B	>= 90%
Grade C	>= 85%
Grade D	>= 75%

The product receives Grade F and fails the test if it:

achieves a lower than 75% true positive rate for the Certification Set, or
generates a greater than 0.05% false positive rate for the Clean Set.

Certification criteria

Certified status can be earned and maintained by products that are tested frequently and successfully meet the test criteria.

The product earns its certified status with the publication of a successful (passed) test report.
The product retains its certified status until any of the following happens:
- The product generates two failed tests in a row.
- The last successful (passed) test report becomes 180 days old.
- VB’s mandate to certify the product terminates without a successive plan and with 60 days elapsed since the termination date.

Detailed test information

Test environment

The test is performed on virtual machines (VMs), hosted by a Type I (bare metal) hypervisor. VMs are provided with resources approximately equivalent to an average endpoint business PC specification.

The test platform is x64 Windows 11 (edition unspecified), regularly maintained with updates from Microsoft.

Detailed test process description

Test data collection is performed by a custom test agent application developed by Virus Bulletin. This agent runs on the test VM, interacts with test case sets, triggers responses from the product, and generates detailed log-based evidence. These logs later serve as the basis for generating the test results.

The testing process has two distinct phases, each of which consists of several steps.

Real-Time Phase. For every sample in the test case set:
1. The sample is downloaded through HTTP(S) from VB-operated infrastructure into memory. Note that HTTP merely acts as a means to transport samples to the test system and no attempt is made to simulate in-the-wild URLs.
2. The integrity of the in-memory sample is verified, by calculating the SHA-256 hash of the in-memory content and comparing that hash against the expected reference hash.
3. The sample is written from the memory to the file system.
4. The integrity of the sample in the file system is verified, by opening and reading the sample file, calculating its SHA-256 hash and verifying the calculated value against the expected reference hash.
5. A suspended Windows process is created for the sample if it meets the eligibility criteria.
Follow-Up Phase. This phase is started manually. We aim to keep the gap between the phases to a minimum.
1. Test engineers request the product to perform an on-demand scan on the (remainder of) samples present in the file system. Actions proposed by the product (e.g. quarantining, removal, disinfection, etc.) are carried out.
2. The test agent is started again to inventory the state of the samples on the system. For each sample, the agent performs an integrity check precisely in the same manner as described in 1.4.

Products may intervene in this process in several places. Any of the below may be treated as intervention from the product:

Network Interception: Transmission/connection termination, HTTP errors and any successful HTTP transmission (HTTP/200) with failed integrity checks for the transmitted content.
Access Control: A broad range of I/O errors that prevent opening, reading or writing samples.
Removal: The disappearance of successfully persisted samples from the file system (e.g. deletion, quarantining, renaming, etc.).
Process Launch Intervention: A failure to launch a process, depending on the Windows error code.
Modification: Integrity check failures after e.g. disinfection, disarming, etc.

Note that a single test case may trigger multiple responses from the product. In such cases, the test deems the product’s first response the most relevant. In a typical example, the tested product may allow downloading and writing a malicious sample to the file system, but it will block opening of the sample for reading (Access Control). The file will be removed during the subsequent on-demand scan, so the inventory process will record a Removal type of intervention for the sample. As the design considers the first intervention to be the most relevant, the ultimate test case outcome will be the first, Access Control-type intervention.

Process launches

As the final step in the Real-Time Phase, the VB100 test agent attempts to launch a process for a subset of PE-type samples that meet the specific criteria outlined below. The process is launched in a suspended state using the Windows CreateProcess() API with the CREATE_SUSPENDED flag, and is terminated immediately afterwards. Certain failure codes are attributed to the tested product.

By default, the error codes attributed to product intervention are ERROR_ACCESS_DENIED, ERROR_VIRUS_INFECTED and ERROR_VIRUS_DELETED. This list may be extended on a case-by-case basis at VB’s discretion, subject to agreement with the vendor prior to the test.

Since the sample set may include various PE-type files (e.g. executables, DLLs, drivers, EFI images), only those with a high likelihood of successfully starting on the test system will be launched. Virus Bulletin conducts a series of checks to ensure this, including verifying that the PE file has a healthy and valid structure, is a stand-alone executable targeting X86/X64 architecture, and is designed for either the GUI or console Windows subsystem.

All VB100 tests use this process launch trigger by default, with exceptions detailed below.

The anatomy of a test

Overview

The product lifecycle in the certification programme begins with an initial product setup, followed by periodic testing. The default testing schedule is quarterly (approximately), unless otherwise agreed by VB and the vendor (within the confines of the general certification criteria).

Periodic tests follow the script below:

Test setup
Data collection
Test case validation
Feedback and disputes
Publication

These steps are explained in the next chapters.

Product setup

Products are set up in the test environment when they are first submitted to the test. Installation is performed on a clean Windows image. The product is configured as per its default settings (exceptions apply, see Product build and configuration policies).

A snapshot of this state is captured for use in testing (“baseline snapshot”). This snapshot is maintained throughout the participation of the product in the test.

Test setup

Shortly before the data collection commences, the following procedure is performed to make sure that the product is ready for testing:

The baseline VM snapshot for the product is restored, and the VM is started.
All recommended Windows updates are installed.
Product changes (such as replacing the last build with a new one, configuration changes requested by the vendor, etc.) are performed by the engineers.
The test system is rebooted if prompted to do so, then shut down.
A new baseline VM snapshot is created, replacing the previous baseline.

Data collection

The VM is started from its baseline state.
Product warm-up is performed.
- The product is allowed time to “warm up”, i.e. to update itself through automatic updates. The product user interface is monitored and any updates prompting for user approval will be approved by the test engineers.
- The test engineers make reasonable efforts to confirm the product has completed its own updates successfully, including launching the update processes manually if needed.
Data collection is completed using the VB100 test agent, as described earlier.
Where available, product logs are captured for vendor reference.
A snapshot of the test VM is taken in its post-test state for the case further data needs to be extracted from the system. Note that current storage space constraints may dictate that residual samples (often weighing tens of gigabytes) are removed from the system prior to taking the snapshot.

Test case validation

Data collection is generally conducted using an initial, candidate version of the test case set. This candidate may contain samples that we later deem unsuitable for the test. In this stage, VB may remove samples from the set. Data collected for such samples are ignored, as if the sample wasn’t part of the test set at all.

Feedback and disputes

The vendor receives preliminary feedback on the product’s performance. This marks the beginning of the dispute period, during which the vendor can examine the results and file any disputes that they may have. At least one calendar week is provided for the vendor to complete their independent verification of the results.

Feedback includes at least the following:

List of samples classified as false positives, by hash.

List of samples classified as false negatives (i.e. misses), by hash.

Upon request, VB will also provide:

Captured product log and data files.
False positive or false negative files (binaries). In the case of an excessive number of such binaries, the number of samples shared may be capped.
Log data from Virus Bulletin's own test agent, including high resolution timing information, hashes, error messages and other hard evidence.
The ability to audit the pre-test or post-test state of the VM.

Depending on the outcome of the disputes, further samples may be removed from the test case set in this step.

Publication

Test results are generated against the final state of the test sets, followed by the publication of the test report and a review of the certified status of the product.

Policies

Product build and configuration policies

We aim to make test report data representative for the average use case. Accordingly, we attempt to test with product builds that are available for general audiences, and to configure/operate the products as close to the defaults as possible.

We acknowledge that this is not always possible. In such cases, substantial deviations from the average use case – where they might impact the test results – will be documented in the published test report for full disclosure.

Some examples of such deviations:

Significant: Non-default detection engine configuration (e.g. increased sensitivity, archive scan depth, timeout, etc.).
Significant: Non-default detection action for “grey” cases such as Potentially Unwanted Applications (PUAs).
Significant: Custom product build with fix for a bug affecting our particular test environment.
Not significant: Increased product logging level for better evidence capture.

Please note that VB accepts such proposed changes from the vendors at its discretion. Proposals that would negatively impact the relevance of the report to the reader may be rejected.

Binary classification mapping

Security products often classify threats into various classes, including grey areas like PUAs.

However, the VB100 test relies on binary classification – there is either a hit on a test case, or there is no hit. For such grey area cases, the product vendor is advised to provide VB with configuration and usage instructions that follow the vendor’s view on how these test cases should be classified in the VB100 framework. In lieu of such instructions, test engineers will either follow the defaults or advise the vendor on the recommended settings for best compatibility with the testbed.

Treatment of products with partial feature coverage

The VB100 design assumes that the product has both real-time protection and on-demand file system scanning features available. When that assumption is not applicable to the product (e.g. a command-line scanner with no real-time element, a detect-and-report only product), VB will examine whether it is possible to accommodate the product within the confines of this methodology. For products that only offer reporting and no actions to be carried out on the samples, VB will simulate the action based on the product’s own report.

As such circumstances are of interest to the reader of the test report, special setups like the above will be documented in the test report.

Withdrawal from the test (opting out)

A product cannot be withdrawn once data collection has started. Public interest dictates that a test report is to be published, regardless of whether it’s favourable or not from the vendor’s perspective.

However, VB may, at its discretion, allow withdrawal of a product in extraordinary cases, when compelling reasons suggest that the report would bear no relevance to the public. Examples of such situations are: collected data is proven to be tainted by lab-specific technical issues; significant testing errors have occurred, such as deviations from protocol, etc.

Exceptions to the process launch trigger

Virus Bulletin aims to execute this trigger in all tests. However, certain scenarios may prevent its successful execution or render it unnecessary. In such cases, Virus Bulletin may disable this step in the test after consulting with the vendor. Some examples of such exceptions include:

The tested product terminates or otherwise disables the parent process (i.e. the VB100 test agent), and this behaviour cannot be disabled within the product.
The tested product returns error codes that are ambiguous and cannot be reliably attributed solely to the product’s intervention.
The tested product does not support intervening in product launches.

Some products may exhibit elevated sensitivity of process launches and this heightened vigilance may lead to a higher false positive rate. This behaviour is considered relevant information for report readers and will not result in the disabling of this trigger.

Technical issue resolution

VB pledges to work with the vendor to resolve technical issues with the product and notify the vendor as soon as possible when such issues are detected.

Note that if troubleshooting involves sharing the samples used in testing through logs or by other means, VB reserves the right to postpone the test until a new test set (one that includes samples not known to the vendor) becomes available for testing.

Technical issues that occur during the data collection phase generally result in the test data being discarded and the test being rescheduled, to prevent reporting based on incomplete, unreliable or assumptions-based data. Note that technical issues that impact not just the particular test environment but a wider user base (e.g. cloud outage, erroneous signature updates, etc.) at the time of the test do not qualify as a basis for test invalidation and Virus Bulletin may proceed to publishing the report.

Disputes

Disputes are evaluated on a case-by-case basis. The vendor is asked to provide supporting data or evidence, if any, along with their dispute. Although all efforts will be made to resolve disputed issues to the satisfaction of all parties, VB reserves the right to make the final decision.

To reflect the broad nature of real-life issues, the scope of the disputes is not limited. The following are a few examples:

False negatives (e.g. on the basis of a sample being corrupted, greyware, etc.)
False positives (e.g. corrupted files, greyware, PUAs, etc.)
Technical issues (e.g. the product did not function as expected during the test)

Vendor commentary

Prior to the publication of a report, the vendor of the product can choose to provide commentary to be included in the report notes. This is to ensure that the vendor’s perspective receives a fair representation. Such commentaries can be useful when the report contents are disputed, or if the report fails to disclose significant circumstances for the reader of the report. Commentaries are subject to reasonable length limits (must fit the designated page) and editorial approval.

Product audit

Outside the data collection phase, vendors can request an audit of the product configuration and VM state. This is primarily done through a manual verification of the desired configuration or state (as provided by the vendor), however remote access can also be arranged at a time suitable for both parties. Remote access is supervised by Virus Bulletin personnel.

Changelog

Version 1.7

Introduced a new process launch trigger in the Real-Time Phase.

Version 1.6

Changed the test platform to Windows 11 from Windows 10.

Version 1.5

Added more details on sample curation.
Introduced a formal policy on vendor commentary in report.
Introduced a formal policy on product audits.
Extended the technical issue resolution policy.

Version 1.4

Removed the ‘Diversity Test’ module
Introduced test grading and the associated updated test criteria.

Version 1.3

Rewritten from scratch to accommodate for 1.3 methodology changes. Main changes include the introduction of the 1.3 test pipeline and the certification criteria.

Version 1.2

Removed Windows 7 as a test platform.

Version 1.1

Certification criteria update
New test module 'Diversity Test'
Obsolete RAP (Reactive/Proactive) Test removed
Running the test inside virtual machines enabled
Clean file set usage in the test updated
Updates on how false positive and false negative events are counted
Updates on what file types are considered false positives and false negatives
New or updated policies on:
Acceptance of custom product builds in the test
Acceptable product updates during testing
Prerequisites of on-demand and on-access test modes
Switching between on-demand and on-access test modes
Sufficient data for certification
Minor wording changes

Version 1.0: First published in this format.

VB100 Methodology - ver.1.7

Overview

Certification objective

Test process outline

Test case sets

Test results compilation

Criteria

Test criteria

Certification criteria

Detailed test information

Test environment

Detailed test process description

Process launches

The anatomy of a test

Overview

Product setup

Test setup

Data collection

Test case validation

Feedback and disputes

Publication

Policies

Product build and configuration policies

Binary classification mapping

Treatment of products with partial feature coverage

Withdrawal from the test (opting out)

Exceptions to the process launch trigger

Technical issue resolution

Disputes

Vendor commentary

Product audit

Changelog

VB100

VB100 certified products

For end-users

For vendors

VB testing

VB100

VBSpam

Consultancy services