How much malware is really out there?


Robert Sandilands

Commtouch, USA
Editor: Helen Martin


‘For most people even a single piece of malware is too much – especially if they are currently affected by it.' Robert Sandilands, Commtouch

Since the beginning of the AV industry more than two decades ago, the amount of malware in existence has been an often-debated point. Answers range from none to an infinite amount.

If you use a custom operating system on custom hardware that is running applications that are of no importance to anybody but yourself, then you are probably right to assert that there is no malware. There are probably no financial (or other) incentives to attack such a system.

To get closer to a real answer we need to look a bit further than this contrived example – although there may be some truth in the observation that there is only as much malware as people want to know about. Whether that leaves you with no malware or with an infinite amount depends on the perspective. For most people even a single piece of malware is too much – especially if they are currently affected by it.

Let us assume that any platform that is somewhat accessible and has either a large enough user base or great enough value will eventually be attacked by malware. We can see this with the recent growth of Mac malware and also with something like Stuxnet that (probably) attacked a single, but very high-value target.

In the mid 90s we were in a position where we could accurately count the number of viruses that had been seen. This was possible for several reasons:

  1. The number of new viruses was small enough for each sample to be identified and analysed in detail.

  2. It was easy to determine which part was virus and which part was the infected application.

  3. The size and complexity of the malware was quite limited.

If you took one of the polymorphic file infectors from the 1990s and infected 100 million clean files then you could get 100 million unique infected files. If you then counted that like most people count malware today you could say that you had 100 million pieces of malware. This would be incorrect, but it is how malware tends to be counted these days.

There are several reasons for this. The first is that modern malware is probably several orders of magnitude larger and more complex than the malware that was around in the mid 90s. The second major reason is the use of packers to obfuscate the malware. The last, and probably most important reason is the location of the polymorphic engine. This has moved from being inside the 1990s virus to being on the server today, where analysts generally cannot access it.

In the old days we could carefully replicate most pieces of malware in a protected and isolated environment and gain a good understanding of how each morphs and we could therefore use a small number of very efficient signatures to detect those pieces of malware.

These days most pieces of malware won’t work without what appears to be a real Internet connection. They generally also won’t replicate. To get a ‘replicated’ copy you either have to be reinfected or download a new copy of the malware.

Not only that, but analysing a specific piece of malware in detail can take weeks to months. For example, people are still busy analysing Stuxnet more than six months after the initial samples were found and we don’t yet have a complete picture of the malware. Given the flood of malicious files we receive, we are rarely, if ever, able to spend such amounts of time on any specific malware or malware family.

We have a catch-22 situation. If we don’t take the time to analyse the malware and understand that we are actually working with a limited set of malware families then we are dealing with a virtually infinite amount. If we take the time to understand each malware family, then proper detection for the family will take significantly longer than our customers will accept. In the end it is all about doing it fast or doing it well. You can rarely do both.



Latest articles:

VB2018 paper: Anatomy of an attack: detecting and defeating CRASHOVERRIDE

CRASHOVERRIDE is the first publicly known malware designed to impact electric grid operations. Reviewing previously unavailable data covering logs, forensics, and various incident information, in this paper Joe Slowik outlines the CRASHOVERRIDE…

VB2018 paper: The modality of mortality in domain names

Domains slated for abusive uses are effectively disposable: they are registered, quickly abused for cybercrime, and abandoned. In this paper Paul Vixie describes the first systematic study of domain lifetimes, unravelling their complexities and…

VB2018 paper: Analysing compiled binaries using Logic

In this paper Thaís Moreira Hamasaki provides an introduction to some practical applications of SMT solvers in IT security, investigating the theoretical limitations and practical solutions, focusing on their use as a tool for binary static analysis.

VB2018 paper: Internet balkanization: why are we raising borders online?

Nowadays, walls are not just being raised in the real world, but on the Internet as well. Countries want to isolate themselves and shut down the information they are not comfortable with, or the companies they don’t want to do business with. Freedom…

VB2018 paper: Where have all the good hires gone?

Much ink has been spilled on the subject of the information security skills gap, and how difficult it is to hire and retain people for these positions. And yet, we all know someone who has had a hard time finding a suitable position despite having…

Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.