Improving proactive detection of packed malware

2006-03-01

Adrian Stepan

Microsoft, USA

Editor: Helen Martin

Abstract

Dynamic translation is a technique that can be used instead of emulation for decrypting complex malware. Adrian Stepan describes how the technique can also be used to perform generic unpacking.

Table of contents

Introduction
Of old and new obfuscation techniques
From specific to generic unpacking
Overcoming the speed limitation
Some specific aspects of generic unpacking
Conclusion

Introduction

Malware writers have always striven to create malware in such a way that it will evade detection by anti-virus software. Code obfuscation is one of the methods they use to achieve this. Over time, obfuscation techniques have evolved from simple encryption to polymorphism, metamorphism and packing. The latter is becoming increasingly popular these days, with lots of packing tools available to malware writers, and more of them being created each month.

A lot of developmental effort is required for anti-virus engines to provide unpacking support for all these packers in a timely manner. Generic unpacking is one solution to this problem, but implementing it is not a trivial task. The amount of computing power required by some unpacking routines is beyond the amount that most emulators can provide in a reasonable time. Is there any way to overcome this limitation?

At the VB2005 conference last year I presented a paper entitled 'Defeating polymorphism: beyond emulation'. The paper described a technique called 'dynamic translation' (DT), which can be used instead of emulation for decrypting complex polymorphic malware. The main advantage of this method over emulation is that it is significantly faster. Dynamic translation can also be used to perform generic unpacking, with very few or even no changes required.

Of old and new obfuscation techniques

Theoretically, there is no limit to the complexity of the algorithm that a malicious program can use to obfuscate its code. In practice, however, the complexity of such algorithms is limited both by the fact that the malware in question needs to replicate in a reasonable time, and by the amount of effort that malware writers are willing to invest.

Recently, the majority of new malware has been written in high level languages (HLL), most likely because development is fast and today's malware writers lack assembly skills. Writing polymorphic viruses in a high-level language is very difficult though, and as a result their prevalence has dropped significantly. However, compression algorithms can be developed easily in HLL, and these can be used to pack almost any executable file.

In the case of polymorphic viruses, the number of different encryption algorithms used is so large that writing a specific decryption routine for each of them is not feasible. In the past, however, generic decryption by means of emulation was easier to implement thanks to several factors:

Decryption algorithms required only a few instructions for decrypting each byte, for all but a handful of viruses.
Viruses were quite small, ranging from less than a hundred bytes to a few kilobytes in size; decrypting a virus required emulation of relatively few instructions (from a few hundred to a couple of million).
The amount of memory required was typically small, about the same order of magnitude as the size of the virus.
Decryption typically used only simple arithmetic/logic instructions, so the emulators didn't have to support the full x86 instruction set. FPU/MMX/SSE instructions were rarely used, and (with very few exceptions) they did not affect the decryption process and could be skipped.

On the other hand, the number of different packers/protectors available is small – a few hundred, only a couple of which are polymorphic (e.g. Morphine, Molebox). The number of different compression algorithms used by these is even smaller (Huffman, arithmetic/rangecoder compression, LZW, BurrowsWheeler and a few variants or combinations of these). Therefore, specific unpacking routines to handle all of them can be developed with a reasonable amount of effort.

Algorithms used in compression are more complex and have more stringent memory requirements than the encryption algorithms used by polymorphic viruses. Modern malware, especially those written in high level languages, are also a lot larger than we have seen in the past (up to several hundred kilobytes). Therefore, unpacking them requires many more instructions and proportionally more computational effort.

The use of emulation to achieve unpacking is possible, but it may take a considerable amount of time to complete and it is difficult to implement. Static (specific) unpacking is currently the most widely used unpacking method in AV engines, for both flexibility and performance reasons.

From specific to generic unpacking

The use of packers is becoming increasingly popular among malware writers. A packed piece of malware has a better chance of remaining undetected for a long time, as well as spreading faster due to its smaller size. Packing an existing piece of malware is also by far the easiest way to create a 'new' one. Of the new incoming samples we see, more than 50% are produced simply by repacking existing malware, using different packers.

Presently, there are a few dozen different packing utilities available, most of which can be used by anyone with minimal computer skills. Each of these tools may have several variants or versions – for instance, UPX has more than 20 known versions. There are several scrambler tools that can be used to create modified versions of UPX.

Source code is available on the Internet for a number of packing tools, such as UPX, FSG, Yodacrypt and Morphine. These can easily be modified by any malware writer, by tweaking the compression algorithms or by adding one or even several encryption layers. Such modified packers are being created at a rate of about 10–15 per month.

There is no reliable method of determining whether a given file is packed. Running all available unpacking routines on all files would make scanning unreasonably slow. Therefore, a preliminary check needs to be performed for each supported packer, to determine if a given file might be packed with it. These precheck routines need to be fast, and usually rely on searching for a code pattern known to be present in the unpacker. However, a simple change in the unpacker code would prevent identification of the packer, meaning that unpacking would not even be attempted. In other instances, a packer detection routine may identify a modified packer, but the unpacking routine will not work correctly, due to the modified compression algorithm, added encryption or other changes.

It is possible to write unpacking routines that are resilient to significant variations, but not without more development effort and most likely with some speed penalty as well. In reality, unpacking routines are modified 'after the facta to accommodate new packer versions. The use of generic unpacking would significantly improve proactive detection capabilities while requiring less development effort in the long term, as more and more packers need to be supported.

Overcoming the speed limitation

Emulation has successfully been used in AV engines for a long time to achieve decryption of polymorphic viruses in a generic way. Although emulation can be used for unpacking as well, the method may be too slow to be effective in practice. Typically, emulation is several hundreds of times slower than execution, achieving a speed of about 10 MIPS on an average PC. Large packed files or files packed multiple times may need to run several hundreds of millions or even billions of instructions to be unpacked. Using an emulator for unpacking such files can take several minutes, which is not acceptable for an AV engine.

Emulation speed is not the only difficulty that needs to be addressed in order to perform generic unpacking. Emulators require large amounts of virtual memory, good support for the CPU instruction set, accurate emulation of exception handling, timing and synchronization mechanisms, support for multi-threading, etc. While these problems can be addressed by investing development effort in improving an emulator, the slowdown is a limitation inherent to the method, and one that is very difficult to overcome.

Dynamic translation serves a purpose that is very similar to emulation; however the implementation follows a different principle. An emulator 'interprets' a program's code instruction by instruction. Each instruction is first decoded, in order to determine the instruction type, length, operands, etc. After this, the emulator identifies and calls a routine that updates a data structure describing a virtual system, in the same way as the original instruction would change the state of the real system if executed. These emulation routines are generated statically at 'compile-time' and are part of the emulator.

With dynamic translation, the first step is to divide the program to be analysed into 'basic blocks', that are contiguous blocks of code having a single entry point at the beginning and a single exit point at the end of the code. After each block is identified, it is translated to native executable code, in a process similar to just-in-time compilation. The resultant code is persisted and then executed. A block will only need to be retranslated if the original code is overwritten; this only happens if the program uses self-modifying code – a technique rarely used in programs written in high-level languages. Code that is executed several times, such as a loop construction, will be translated and persisted at the first loop iteration, and for all subsequent iterations the persisted code will be executed.

The method provides a significant speed advantage over emulation, by eliminating the redundant analysis of repeating code sequences. This is especially true in the case of an unpacking routine, which is in fact a loop that uncompresses a few bytes of code at each iteration. An additional increase in speed can be achieved by omitting detection of self-modifiable code for programs written in HLL.

Some specific aspects of generic unpacking

Running a packed binary inside a virtual machine, based on either emulation or DT, will produce an unpacked image of the binary, assuming the virtual machine is implemented properly. However, this does not guarantee detection of a piece of malware that is present in the obtained image, as in most cases, the image will not be identical to the original binary file. This happens because several packers use specific techniques to modify the original binary, in addition to compression, for the purpose of defeating generic unpacking.

A possible mitigation is the use of signature patterns or detection routines that are resilient to these techniques. Another solution relies on reconstructing the original binary from the image obtained by using generic unpacking. This can be achieved by identifying the obfuscation techniques that have been used and running appropriate specific or generic routines to revert the effect of those techniques.

A few packers, such as Pespin, redirect API calls by changing the offsets of indirect call instructions in the packed program to use an address table generated by the unpacking routine. Pespin also uses a 'code stealing' technique to obfuscate some code sequences in the original program. A particular code chunk (starting, for instance, at the original entry point) is selected and removed. In order to maintain the original functionality, an equivalent code sequence is generated at a different address and called instead of the original code. Any signature including the stolen code sequence or modified call offsets will no longer be matched for a packed malware. Detection can be achieved by using dedicated routines to recover the original code, or by not using any stolen/modified bytes as part of a signature.

In some cases, files that are packed multiple times with different packers will no longer be functional. For instance, files that are packed with Molebox and after that packed again with another packer will fail to execute correctly, because Molebox relies on the code integrity of the file to compute a key needed for unpacking. In this case, generic unpacking, based on either emulation or DT, will fail. Specific unpacking routines may still succeed to unpack such a file, by computing the key based on a partially unpacked image of the file immediately before the Molebox layer.

Conclusion

Dynamic translation can be used to achieve generic unpacking with good speed performance. However, detecting generically unpacked malware using an existing signature set is not guaranteed to succeed. New signatures can be extracted to mitigate this problem. Combining DT with specific routines to rebuild the original binary from an unpacked image is yet another solution to be explored.

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Bulletin Archive