VB100 Methodology - ver.1.3

Methodology for the VB100 certification test

Overview

VB100 is a certification test for Windows endpoint security solutions.

 

Certification objective

VB100 seeks to establish whether the static detection layer of the tested product is capable of detecting common, file-based Windows threats, without generating an excessive number of false alarms for legitimate programs.

 

Test process outline

The test exposes the product to both malicious and legitimate program samples, and records the tested product’s response to these samples.

Exposure to samples happens first by downloading and saving each sample in the local file system in the presence of the product. Then, a scan-on-demand is requested from the product for the successfully saved samples. Finally, any remaining samples are inventoried and their integrity is verified.

Products are expected to demonstrate their protection capabilities by their intervention (or the lack thereof) in this process. Any samples that make it to the end of this process without any errors or changes are considered “not detected”, whereas all other samples are considered as “detected”.

 

Test case sets

Test case sets are compiled frequently to include freshly acquired samples. We aim to base each new test for the product on the most recent test set available. The same test sets may be used for multiple products if they are tested close together in time.

Test case sets are composed of multiple subsets:

  • Certification Set: Common and prevalent Windows malware recently observed in the wild. Set size varies, generally including 1,000 - 2, 000 purely Windows PE-type cases.

  • Clean Set: Widely and less widely used legitimate program files. This set contains 100,000 samples, selected randomly from a much larger repository of samples. This set includes both PE (at least 25%) and miscellaneous file types.

  • Diversity Set: Assorted malicious Windows executables, including less clear-cut cases and more obscure threats. Predominantly PE-type. Typically includes 1,000 samples selected at random from a larger repository.

 

Test results compilation

After completing the test process, the product’s responses are referenced against the respective body of the test cases and are sorted into the following categories:

  1. True positive = malicious test case identified as malicious (malware detected)

  2. True negative = legitimate test case identified as legitimate (legitimate treated as such)

  3. False positive = legitimate test case identified as malicious (false alarm)

  4. False negative = malicious test case identified as legitimate (malware missed)

Rates for each set are determined and verified against the test criteria. A test report is published (regardless of whether the test criteria are met) and the certified state of the product is reviewed.

 

Criteria

 

Test criteria

A product is considered to have successfully passed a test if the results obtained meet the following criteria:

  • 99.5% or higher true positive rate for the Certification Set, i.e. the product may not miss more than 0.5% of malicious cases for this set.

  • 0.01% or lower false positive rate for the Clean Set.

The Diversity Set is supplementary and does not contribute to the test criteria.

 

Certification criteria

Certified status can be earned and maintained by products that are tested frequently and successfully meet the test criteria.

  • The product earns its certified status with the publication of a successful (passed) test report.

  • The product retains its certified status until any of the following happens:
    • The product generates two failed tests in a row.
    • The last successful (passed) test report becomes 180 days old.
    • VB’s mandate to certify the product terminates without a successive plan and with 60 days elapsed since the termination date.

 

Detailed test information

 

Test environment

The test is performed on virtual machines (VMs), hosted by a Type I (bare metal) hypervisor. VMs are provided resources approximately equivalent to an average endpoint business PC specification.

The test platform is x64 Windows 10 (edition unspecified), regularly maintained with updates from Microsoft.

 

Detailed test process description

Test data collection is performed by a custom test agent application developed by Virus Bulletin. This agent runs on the test VM, interacts with test case sets, triggers responses from the product, and generates detailed log-based evidence. These logs later serve as the basis for generating the test results.

The testing process has two distinct phases, each of which consists of several steps.

    1. Real-Time Phase. For every sample in the test case set:

      1. The sample is downloaded through HTTP(S) from VB-operated infrastructure into memory. Note that HTTP merely acts as a means to transport samples to the test system and no attempt is made to simulate in-the-wild URLs.

      2. The integrity of the in-memory sample is verified, by calculating the SHA-256 hash of the in-memory content and comparing that hash against the expected reference hash.

      3. The sample is written from the memory to the file system.

      4. The integrity of the sample in the file system is verified, by opening and reading the sample file, calculating its SHA-256 hash and verifying the calculated value against the expected reference hash.

    2. Follow-Up Phase. This phase is started manually. We aim to keep the gap between the phases to a minimum.

      1. Test engineers request the product to perform an on-demand scan on the (remainder of) samples present in the file system. Actions proposed by the product (e.g. quarantining, removal, disinfection, etc.) are carried out.

      2. The test agent is started again to inventory the state of the samples on the system. For each sample, the agent performs an integrity check precisely in the same manner as described in 1.4.

Products may intervene in this process in several places. Any of the below may be treated as intervention from the product:

  • Network Interception: Transmission/connection termination, HTTP errors and any successful HTTP transmission (HTTP/200) with failed integrity checks for the transmitted content.

  • Access Control: A broad range of I/O errors that prevent opening, reading or writing samples.

  • Removal: The disappearance of successfully persisted samples from the file system (e.g. deletion, quarantining, renaming, etc.).

  • Modification: Integrity check failures after e.g. disinfection, disarming, etc.

Note that a single test case may trigger multiple responses from the product. In such cases, the test deems the product’s first response the most relevant. In a typical example, the tested product may allow downloading and writing a malicious sample to the file system, but it will block opening of the sample for reading (Access Control). The file will be removed during the subsequent on-demand scan, so the inventory process will record a Removal type of intervention for the sample. As the design considers the first intervention to be the most relevant, the ultimate test case outcome will be the first, Access Control-type intervention.

 

The anatomy of a test

 

Overview

The product lifecycle in the certification programme begins with an initial product setup, followed by periodic testing. The default testing schedule is quarterly (approximately), unless otherwise agreed by VB and the vendor (within the confines of the general certification criteria).

Periodic tests follow the script below:

  • Test setup
  • Data collection
  • Test case validation
  • Feedback and disputes
  • Publication

These steps are explained in the next chapters.

 

Product setup

Products are set up in the test environment when they are first submitted to the test. Installation is performed on a clean Windows image. The product is configured as per its default settings (exceptions apply, see Product build and configuration policies).

A snapshot of this state is captured for use in testing (“baseline snapshot”). This snapshot is maintained throughout the participation of the product in the test.

 

Test setup

Shortly before the data collection commences, the following procedure is performed to make sure that the product is ready for testing:

  • The baseline VM snapshot for the product is restored, and the VM is started.

  • All recommended Windows updates are installed.

  • Product changes (such as replacing the last build with a new one, configuration changes requested by the vendor, etc.) are performed by the engineers.

  • The test system is rebooted if prompted to do so, then shut down.

  • A new baseline VM snapshot is created, replacing the previous baseline.

 

Data collection

  • The VM is started from its baseline state.

  • Product warm-up is performed.

    • The product is allowed time to “warm up”, i.e. to update itself through automatic updates. The product user interface is monitored and any updates prompting for user approval will be approved by the test engineers.

    • The test engineers make reasonable efforts to confirm the product has completed its own updates successfully, including launching the update processes manually if needed.

  • Data collection is completed using the VB100 test agent, as described earlier.

  • Where available, product logs are captured for vendor reference.

  • A snapshot of the test VM is taken in its post-test state for the case further data needs to be extracted from the system. Note that current storage space constraints may dictate that residual samples (often weighing tens of gigabytes) are removed from the system prior to taking the snapshot.

 

Test case validation

Data collection is generally conducted using an initial, candidate version of the test set. This candidate may contain samples that we later deem unsuitable for the test. In this stage, VB may remove samples from the set. Data collected for such samples are ignored, as if the sample wasn’t part of the test set at all.

 

Feedback and disputes

The vendor receives preliminary feedback on the product’s performance. This marks the beginning of the dispute period, during which the vendor can examine the results and file any disputes that they may have. At least one calendar week is provided for the vendor to complete their independent verification of the results.

Feedback includes at least the following:

  • List of samples classified as false positives, by hash.
  • List of samples classified as false negatives (i.e. misses), by hash. In the case of an excessive number of false negatives, the number of samples shared may be capped.

Upon request, VB may also provide:

  • Captured product log and data files.
  • False positive or false negative files.
  • The ability to retrieve further data from the post-test VM snapshot.

Depending on the outcome of the disputes, further samples may be removed from the test case set in this step.

 

Publication

Test results are generated against the final state of the test sets, followed by the publication of the test report and a review of the certified status of the product.

 

Policies

 

Product build and configuration policies

We aim to make test report data representative for the average use case. Accordingly, we attempt to test with product builds that are available for general audiences, and to configure/operate the products as close to the defaults as possible.

We acknowledge that this is not always possible. In such cases, substantial deviations from the average use case – where they might impact the test results – will be documented in the published test report for full disclosure.

Some examples of such deviations:

  • Significant: Non-default detection engine configuration (e.g. increased sensitivity, archive scan depth, timeout, etc.).
  • Significant: Non-default detection action for “grey” cases such as Potentially Unwanted Applications (PUAs).
  • Significant: Custom product build with fix for a bug affecting our particular test environment.
  • Not significant: Increased product logging level for better evidence capture.

Please note that VB accepts such proposed changes from the vendors at its discretion. Proposals that would negatively impact the relevance of the report to the reader may be rejected.

 

Binary classification mapping

Security products often classify threats into various classes, including grey areas like PUAs.

However, the VB100 test relies on binary classification – there is either a hit on a test case, or there is no hit. For such grey area cases, the product vendor is advised to provide VB with configuration and usage instructions that follow the vendor’s view on how these test cases should be classified in the VB100 framework. In lieu of such instructions, test engineers will either follow the defaults or advise the vendor on the recommended settings for best compatibility with the testbed.

 

Treatment of products with partial feature coverage

The VB100 design assumes that the product has both real-time file system protection and on-demand file system scanning features available. When that assumption is not applicable to the product (e.g. a command-line scanner with no real-time element, a detect-and-report only product), VB will examine whether it is possible to accommodate the product within the confines of this methodology. For products that only offer reporting and no actions to be carried out on the samples, VB will simulate the action based on the product’s own report.

As such circumstances are of interest to the reader of the test report, special setups like the above will be documented in the test report.

 

Sufficient data

A test is considered valid if the data collected for the Certification and Clean Set is healthy. Since the Diversity Set is supplementary, issues with that particular set do not invalidate a report.

 

Withdrawal from the test

A product cannot be withdrawn once data collection has started. Public interest dictates that a test report is to be published, regardless of whether it’s favourable or not from the vendor’s perspective.

However, VB may, at its discretion, allow withdrawal of a product in extraordinary cases, when compelling reasons suggest that the report would bear no relevance to the public. Examples of such situations are: collected data is proven to be tainted by lab-specific technical issues; significant testing errors have occurred, such as deviations from protocol, etc.

 

Technical issue resolution

VB pledges to work with the vendor to resolve technical issues with the product and notify the vendor as soon as possible when such issues are detected.

Note that if troubleshooting involves sharing the samples used in testing through logs or by other means, VB reserves the right to postpone the test until a new test set (one that includes samples not known to the vendor) becomes available for testing.

 

Disputes

Disputes are evaluated on a case-by-case basis. The vendor is asked to provide supporting data or evidence, if any, along with their dispute. Although all efforts will be made to resolve disputed issues to the satisfaction of all parties, VB reserves the right to make the final decision.

To reflect the broad nature of real-life issues, the scope of the disputes is not limited. The following are a few examples:

  • False negatives (e.g. on the basis of a sample being corrupted, greyware, etc.)
  • False positives (e.g. corrupted files, greyware, PUAs, etc.)
  • General performance issues (e.g. the product did not function as expected during the test)

 

Changelog

Version 1.3:

  • Rewritten from scratch to accommodate for 1.3 methodology changes. Main changes include the introduction of the 1.3 test pipeline and the certification criteria.

Version 1.2

  • Removed Windows 7 as a test platform.

Version 1.1

  • Certification criteria update
  • New test module 'Diversity Test'
  • Obsolete RAP (Reactive/Proactive) Test removed
  • Running the test inside virtual machines enabled
  • Clean file set usage in the test updated
  • Updates on how false positive and false negative events are counted
  • Updates on what file types are considered false positives and false negatives
  • New or updated policies on:
  • Acceptance of custom product builds in the test
  • Acceptable product updates during testing
  • Prerequisites of on-demand and on-access test modes
  • Switching between on-demand and on-access test modes
  • Sufficient data for certification
  • Minor wording changes

Version 1.0: First published in this format.

 

 

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.