The dark side of whitelisting

2007-08-01

Dr Vesselin Bontchev

FRISK Software International, Bulgaria
Editor: Helen Martin

Abstract

Dr Vesselin Bontchev shares his views on whitelisting and why conventional anti-virus scanners will be around for a long time to come.


Introduction

For the past year and a half, Robin Bloor has repeatedly posted to his blog and to various other places on the web articles about how ‘anti-virus’ is dead or dying slowly, or at least in serious trouble. One of his articles [1] caught my attention. There were so many wrong things in it, that I could not resist posting a comment in response. Later it occurred to me that such articles might mislead some people, so I decided it was worth writing an article on this subject myself – something people could use as a reference for debunking such delusions.

Like many people who are not anti-virus specialists, Mr Bloor understands the term ‘anti-virus’ essentially to mean known-malware scanners. Of course, this is by no means the only kind of anti-virus technology in existence – it is only the most widely used one. In [2] I list seven main types of anti-virus programs, some of which are further subdivided into half a dozen subtypes – and that source is already out of date, having been written more than a decade ago.

The reason why Mr Bloor has become disenchanted with ‘anti-virus’ seems to be because it has become difficult for the known-malware scanners to cope with the constantly increasing flood of new malware. For instance, the Storm family of Trojan horses uses what we call ‘server-side polymorphism’ – every time someone downloads one, the server generates a new variant, resulting in about 54,000 variants released in just one week [3]. Unlike conventional polymorphism found in viruses, this malware is not self-replicating and the polymorphic engine is not present in the malware for the anti-virus researcher to analyse and develop generic detection for. Of course, this does not mean that there are no efficient techniques against this (see [3] for details). Nowadays there are additional techniques like sandboxing, for instance.

Conventional scanners are notoriously bad at coping with non-replicating malware. A virus replicates – so once the AV developer implements detection of it in the scanner, other users will be protected. However, trojans are usually one-shot weapons – at the time at which the anti-virus developers receive a sample, the victim is already compromised and the same trojan is unlikely to be used again against somebody else, so implementing detection of it isn’t of much help to anyone.

Of course, the fact that the number of known malware programs is increasing at an ever faster rate has prompted many people to predict, over the past couple of decades, that at some point conventional scanners will ‘die’ – i.e. will become ineffective. Nevertheless, conventional scanners are still very much alive and kicking – although the technology behind them has improved significantly (albeit in ways not immediately obvious to the user). There are perfectly good reasons for this – conventional scanners are the kind of anti-virus programs with which the average user is most comfortable, in the sense that he or she can easily understand and use them. I am pretty sure that another decade from now conventional scanners will still be in use and people will still be predicting their imminent demise.

Another thing about conventional scanners that Mr Bloor does not like (and he is certainly not alone in this), is that they require constant updating. That is, the user has to keep paying for them.

Anyway, since Mr Bloor has decided that conventional anti-virus does not work any more, he is left with the question of what to replace it with – because something has to be used to protect the user from malware, right? In this case, this panacea seems to be the method known as whitelisting.

What is whitelisting?

If you consider how conventional scanning works, it basically builds a blacklist – a list of known bad programs that shouldn’t be allowed to run on the user’s computer. As new malicious programs appear, they are added to the blacklist – this is why scanners need constant updating.

Whitelisting is the opposite of this. Instead of building a list of things that should not be allowed to run, the anti-virus program uses a list of things that should be allowed to run – and denies execution of everything that is not on this whitelist.

The idea of whitelisting is certainly not new. For a long time, people have been joking that the number of known malicious programs is increasing so quickly, that at some point it will be easier to scan for known good programs instead. As early as 1990, Dr Fred Cohen [4] proposed the idea of ‘integrity shells’ – programs that would only allow the execution of software with known-good integrity (using a database of checksums to control integrity and relying on the fact that viruses have to modify the programs they infect, thus destroying their integrity). I use the Kerio personal firewall in ‘paranoid’ mode, which makes it issue an alert every time an ‘unapproved’ program is executed – this is a kind of whitelisting, too.

Since whitelisting has been around for almost two decades, the natural question to ask is: if it is so much better than conventional scanning, why has it not replaced the latter already? The answer, of course, is because it cannot. It, too, is fraught with problems. On its own, whitelisting cannot stop malware – just as, on its own, conventional scanning (or other ‘blacklisting’-based approaches) cannot.

The problems of whitelisting

Basically, the problems of whitelisting boil down to two issues: what is a program?, and which programs are good? [5]. (Just like the problems of any blacklisting-based approach boil down to two issues – what is a program? and which programs are bad?)

Exotic execution exceptions, or what is a program?

Theoretically, the question of 'what is a program?' is an unsolvable one. That is, for any given sequence of symbols, there exists at least one Turing Machine for which this sequence is a program, if it appears somewhere on the machine’s tape.

Of course, real computers are not Turing Machines (a Turing Machine has infinite memory, for instance) – but we hit another snag there. All contemporary computers are based on the von Neumann architecture – and an underlying principle of this architecture is the equivalence between code and data. That is, one program’s code is another program’s data and vice versa. For instance, a JavaScript text is a program to the browser – but it is data to the editor used to create it.

The most trivial approach to whitelisting is to build a database of known-good, frequently used executable files – that is, EXE, COM, BAT and maybe a few kinds of scripts – and deny execution to any executable program not listed in the database. Sadly, this leaves the door open for many other ways in which a piece of malicious code could enter.

For instance, what about Office macros? Of course, once the producer of a whitelist-based protection program has thought about them, it is possible to make the program monitor macros in documents, too. But how many producers would think of them off-hand? And how many have the competence to handle obscure file formats – because you have to get to the macro bodies; you can’t just whitelist some Word documents and deny access to all other documents. And macros are just the beginning.

What about obscure scripting languages? Most producers would probably think of JavaScript and VBScript – but there are so many others, and most of them are sufficiently powerful to use to write viruses. For instance, what about the scripting languages of the various IRC clients (e.g. mIRC)? There are already many viruses written in these languages. Recently, a virus appeared that was written in the scripting language of the hex editor WinHex – how many producers of whitelist-based protections have heard of this scripting language? Other obscure scripting languages for which viruses exist include (but are not limited to): ABAP, ActionScript, Ami Pro macros, AutoLISP, Corel Draw! Script, Ferite, IDA script, KIX, Lua, MathLab script, One C, WinBatch, REG script, SQL, Perl, Python, Ruby… (Please don’t shoot me for calling the latter four ‘obscure scripting languages’.)

Furthermore, what about Office exploits? They arrive in a Word, Excel or PowerPoint document. Some obscure field in the document is corrupted, causing a buffer overflow somewhere in the Office application that opens it. This (the corruption) is the exploit. The exploit causes control to be transferred to a small piece of code that resides in the document too (usually, but not always, close to the corrupted field). This small piece of code is called ‘shellcode’. Then it usually extracts the real malicious program appended (often in encrypted form) after the end of the document – or downloads it from somewhere and runs it.

Now, a whitelist-based approach can prevent the dropped (or downloaded) executable from running. But it cannot stop the execution of the shellcode – not unless it stops the Office applications from running or disallows the opening of foreign documents – both of which would make the machine essentially unusable. And the shellcode doesn’t really have to drop an executable – it’s just easier to implement it this way. The shellcode runs directly in memory, in the context of the user who has opened the malicious document, and can do everything that the user is allowed to do. There is no hope for a whitelist-based approach of preventing that.

And what about threats that take control before the whitelist-based protection has had the chance to execute and then use stealth to disguise their presence? A boot sector virus would be the obvious example but there are many other convoluted ways in which malware can get itself executed during the computer’s startup process and before any whitelist-based protection.

As if all this wasn’t already bad enough – what about threats that do not exist as files at all? A typical example is the CodeRed virus [6]. It enters the attacked system as an HTTP GET request to port 80, exploiting a vulnerability in one of the system DLLs. It is never saved as a file (or any other object on the disk) – instead, it spreads memory-to-memory between the vulnerable machines on the internet.

How would a whitelist-based program protect against that? Even a blacklist-based approach (i.e. a scanner) has trouble with this kind of virus, because there are no obvious files to scan. But at least one can implement a packet scanner (again a blacklist-based approach) and block the requests sent by the virus before they reach the vulnerable DLL. Whitelisting network packets is a hopeless task.

Which programs are good?

The question of whether a program performs only legitimate actions is again an unsolvable one, in the general sense – just like, in general, it is impossible to answer the question of whether a program is a virus or whether it will stop after a finite number of steps (the so-called Halting Problem). The proof that these questions are unsolvable is a constructive one. That is, if someone claims to have invented an algorithm that can solve any one of these questions, the proof shows how to construct a program for which the algorithm will give incorrect results.

But, again, the above is just theory. In practice, there are other, more immediate problems. Let us suppose that you have decided to protect your computer from malware using a whitelist-based approach. For this, you need two things – a program that implements the approach and a whitelist (i.e. a database of legitimate programs that the program would allow to execute, while blocking everything else). OK, the program you purchase from some software producer – but what about the whitelist?

There are three possible solutions: you can purchase such a whitelist from somewhere; you can build one yourself; or you can use a combination of the first two approaches. Unfortunately, each of these approaches has its own problems.

Scalability, or the problems of global whitelists

Let us suppose that you decide to obtain your whitelist from a third party. This could be the same producer who sold you the whitelist-based program – or it could be an independent vendor. In both cases, however, you are most certainly not the only customer this vendor has, and the other customers most certainly don’t use exactly the same ‘good’ programs that you do.

So, in order to serve all its customers, the whitelist vendor will have to compile some kind of global whitelist – a list of all legitimate programs in existence – in order to make sure that all programs their customers are likely to have are covered. In fact, there are already several companies trying to do just that – Securewave, Bit9, AppSense, etc. Unfortunately, as it turns out, their task is much, much more difficult than the task of the anti-virus companies – and for the same reasons, too.

At the recent International Antivirus Testing Workshop in Reykjavik, Iceland, I attended a presentation by Mario Vuksan from Bit9 [7]. According to him, the company is trying to build just such a global whitelist – and is having a very hard time doing it. As it turns out, there are many more legitimate programs than malicious programs out there – many orders of magnitude more, as a matter of fact. Worse, the rate of creation of legitimate programs far exceeds the rate of creation of malicious ones – and it keeps increasing.

According to Mr Vuksan, just Microsoft, IBM, SourceForge and Mozilla.Org produce, respectively, 500 K, 100 K, 500 K and 250 K new executables every day! Currently, Bit9 has 2.7 billion files listed in its global whitelist, aiming for 10 billion (that’s US billion) – and is nowhere near finished. Just the index of the database is more than a hundred gigabytes. So, the joke about the malicious programs soon outnumbering the legitimate ones is just a joke – there is no chance of this happening any time soon (or even at all).

Clearly, the glut problem faced by a global whitelist producer is much worse than the problem faced by the average scanner producer. After all, we have to deal with ‘only’ about 5,000+ malicious programs per month.

This leads to other problems as well. First, how is the company going to deliver this database (the global whitelist) to its customers’ computers? Delivering the whole of it is clearly out of the question – who will be willing to dedicate more than a hundred gigabytes to the database used by their malware protection? Not to mention that it will have to be updated with a couple of million new entries every day – much worse than the oh-so-hated regular scanner updates. (And the producer is unlikely to be willing to update it for free, either.)

Keeping it on the servers of the database producer and having it accessed remotely is not good, either. What is the producer going to tell its customers – ‘sorry, you can’t use your computers today, because we (or you) have a network failure and we can’t check right now whether the program you want to run is legitimate or not’?

And, can you honestly believe that a company examining a couple of million new executables per day is not going to make mistakes and put malicious ones on the list? Even the anti-virus people tend to make an occasional mistake when telling the good programs from the bad – and these people are experts and their workload is orders of magnitude lower!

In addition to the technical problems, a global whitelist raises a political one – pretty much like centralized code signing. It is potentially dangerous to allow one entity to control the process. On the one hand, how can you be sure that a small producer’s program will be included? And, on the other hand, how can you be sure that a big company like Sony BMG will not pay the whitelist producer to include its latest rootkit as a legitimate program in their database?

User (in)competence, or the problems of local whitelists

The alternative to using a whitelist somebody has built for you is to use one you have built yourself. Then you do not have the problem of millions of new legitimate programs per day – because the software installed on your computer is, more or less, constant. Sadly, this approach has problems of its own.

The main problem is that it relies on the user being competent enough to build such a whitelist. It is much more difficult than just running some sort of program that would scan your disks for executable programs and build a database of checksums for them.

For instance, how do you know that your system is not already infected? For most users, the only way to know this is by running a virus scanner. Oops, there goes the hope that conventional anti-virus is ‘dead’. In addition, many users resort to seeking some kind of protection from malware only after their machines become infected.

Furthermore, what are you going to do when you want to install a new program? Obviously, you will have to update the whitelist in order to record the new program as legitimate and allow it to be executed. Of course, in order to do that, you will have to be able to decide whether the new program is legitimate. But if users in general were able to do that, they wouldn’t get infected in the first place and there wouldn’t be any need for anti-virus software (no matter whether whitelist- or blacklist-based) – because everybody would be installing only legitimate programs on their computers!

This approach is applicable only to tightly controlled corporate environments, where security is paramount (and takes precedence over usability), where the local whitelist is built by a competent security administrator, contains a relatively small number of programs, and the users are strictly forbidden from installing and running anything new. Sadly, it is totally unusable in home-user (and even in most corporate) environments.

The sum is less than its parts, or the problems of the combined approach

If neither of the above two approaches really works, then how about a combination of them? For instance, the whitelist-based protection vendor could supply you with a relatively small database of ‘most popular legitimate programs’ and also give you the possibility to update the database locally with any programs that you use but which are not included in it. Maybe if you do that, the combination of the two approaches will cancel out each other’s deficiencies?

Unfortunately, this is not the case. The combined approach just combines the problems of the above two approaches. For instance, who can decide what are ‘the most popular programs’? And they are not a static set, either – so the database will have to keep increasing (or at least keep changing) – meaning that the product will have to be constantly updated – just like a conventional scanner.

And if you are allowed to put your programs on the whitelist, you are most likely to ‘whitelist’ some malicious program by mistake or due to lack of competence.

Why people mostly use scanners

Malicious programs have been with us for more than a quarter of a century. Yet people still mostly use known-malware scanners to protect themselves from such programs. Now, I would be the first to admit that scanners are the weakest kind of protection from malware. Why, then, do people still rely on them almost exclusively?

It is not, as some people would have you believe, because of some kind of dark conspiracy among the anti-virus producers who want to keep getting money from you for their updates. It is because this is what the free market has established, no matter whether we like it or not.

The fact is that the average user is not interested in becoming a security expert. The users just want to be left alone doing their jobs, playing games, surfing the web. They start thinking about security only when some malware bites them.

A known-malware scanner is something the user can easily understand and use. It tells the user ‘no, your computer is not infected’ or ‘yes, your computer is infected with the XYZ virus; do you want me to remove it?’. As opposed to that, the other kinds of anti-malware protection schemes require a significant level of competence from the user, in order to be understood and used correctly.

A heuristic analyser would say, ‘The file Foo may contain a virus’. Well, does it, or doesn’t it? A firewall would say, ‘The program svchost.exe is communicating over port 1900’. What the heck does that mean and should it be permitted? A behaviour blocker would say, ‘The program msvc.exe is trying to write to file Blah.exe’. Is that a virus attack? A whitelist-based protection would either occasionally deny the running of a program the user wants to run, or would ask the user whether the program is legitimate and should be added to the whitelist – a decision the user is generally not equipped to make.

This is why most users keep buying scanners – because they are easy to understand and easy to use. Yes, they are the weakest line of defence against malware – but this is what the users want to buy. And since we, the anti-virus producers, have to eat too, this is what we have been making and selling.

We would gladly sell the users something more secure and some of us have tried to do so over the past couple of decades. Does anybody remember Fred Cohen’s Integrity Shell? The program Untouchable? Integrity Master? These were all generic, integrity-based products that were very efficient at stopping virus infections. Nevertheless, none of these products are manufactured any longer. The users voted with their wallets and the companies making these products either went out of business or switched to something else. The same will happen to whitelisting, which is essentially a form of integrity checking. Any company that makes a product based exclusively on this approach will ultimately fail – because it will sell to only a relatively very small set of customers. Some anti-virus companies will include whitelisting-based protection in their suites – and the users, in general, will happily keep using only the scanner part of these suites – the part they understand.

Use the Force, Luke

From what I have written so far about whitelisting, some readers could be left with the impression that it is a very bad idea, that it does not work and should be avoided like the plague. Nothing could be further from the truth.

As mentioned in the previous section, whitelisting is essentially a form of integrity checking – and I am a very strong proponent of integrity-based malware protection schemes, because malware (and especially computer viruses) are essentially an integrity problem.

With this article I am just trying to emphasize that whitelisting alone is incapable of stopping malware efficiently and is fraught with problems – just like blacklisting, only different kinds of problems. The proper way to protect from malware is by implementing defence in depth, not by relying on any particular single approach.

For example, use scanning to ensure that the system is initially malware-free. Use an integrity checker to ensure that it is not modified without authorization at a later date. Use behaviour blockers to detect intrusions. Use a personal firewall – not only to protect from external attacks but also as a kind of behaviour blocker – to detect if a program on your machine is trying to ‘phone home’ without your knowledge. Use sandboxing to isolate and misdirect potentially troublesome programs. Use encryption to hide the information that does not have to be visible all the time. Use backups!

Unfortunately, I understand very well that this is all just wishful thinking on my part. While a handful of geeks like me might be willing to understand what all of the above means and know how to install and operate it successfully on one’s computer, it will remain forever way above the head of the average user who will happily continue using inefficient known-malware scanners.

Conclusion

Whitelisting, per se, is totally incapable of replacing conventional scanners as far as the general public is concerned. It does work well in small and tightly controlled environments, where security is more important than convenience, but the average home user (and most corporate users, as well) will never be able to rely on it exclusively.

It would be best if whitelisting were combined with blacklisting (and with other anti-virus techniques) to establish defence in depth. Sadly, most users do not have the competence required to build, understand, and maintain such defence – while they are able to use and understand scanners. This is why conventional scanning will be with us for a long, long time.

Bibliography

[1] Robin Bloor. The decline of antivirus and the rise of whitelisting. http://www.theregister.co.uk/2007/06/27/whitelisting_v_antivirus/.

[2] Vesselin Bontchev. Methodology of Computer Anti-Virus Research. Ph.D. thesis. University of Hamburg, Germany, Chapter 4.

[3] Michael Venable, Andrew Walenstein, Matthew Hayes, Christopher Thomson, Arun Lakhotia. Vilo: a shield in the malware variation battle. Virus Bulletin, June, 2007, pp.5–9.

[4] Fred Cohen. Automated Integrity Maintenance for Viral Defense. IFIP-TC11 Computers & Security, 1990.

[5] Kurt Wismer. The rise of whitelisting. Available from http://anti-virus-rants.blogspot.com/2006/03/rise-of-whitelisting.html.

[6] Symantec Security Response. CodeRed Worm. Available from http://www.symantec.com/security_response/writeup.jsp?docid=2001-071911-5755-99&tabid=2.

[7] Mario Vuksan. Building and Leveraging White Database for Antivirus Testing. International Anti-Virus Testing Workshop, Reykjavik, Iceland, May 2007. Available from http://www.slideshare.net/frisksoftware/building-leveraging-white-database-for-antivirus-testing/download.

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.