Needle in a haystack


Gabor Szappanos

Sophos, Hungary
Editor: Helen Martin


Sometimes what looks like a genuine MP3 encoder library, and even works as a functional encoder, actually hides malicious code deep amongst a pile of clean code. Gabor Szappanos reveals the lengths to which one piece of malware goes to hide its tracks.

Malware authors engaged in Advanced Persistent Threat (APT) operations put great effort into making sure their creations live up to their name and achieve persistence over the course of months or years; in order to do so, the threats must remain undetected by security products.

The authors try both to conceal the presence of the threats on infected systems and to hide their code from analysis and detection. Most crimeware authors achieve the latter by applying sophisticated execryptors and protectors to their code.

Over the past year, however, we have spotted a different approach: malicious code is compiled into an open source library, hidden among a large pile of clean library code, with only a single export pointing to the trojan functionality. The deployment and progression of this malware spans about two years now – however its versioning suggests that its development started longer ago than that.

This malware doesn’t take anything for granted: even common system tools like rundll32.exe and wscript.exe, which are present on all Windows systems, are carried with the installer and dropped when needed.

The malware goes to great lengths to cover its tracks. All of the string constants that could reveal the nature of the backdoor are protected with strong encryption. Additionally, the backdoor itself is disguised as a legitimate MP3 encoder library. In fact, it is a legitimate and functional MP3 library – and a bit more besides.

Exploited carrier workbook

In a handful of cases we have been able to identify the original exploited document that leads to the system infection. At the time of finalizing this paper, three exploited workbooks have been found that install this threat.

All of them are protected Excel workbooks with the default password (for more details see [1]). In short: the workbooks are password protected (that is, checked before opening). It is possible to leave the password field blank – in which case Excel encrypts the content using the default password: ‘VelvetSweatshop’. On the other hand, if a workbook is protected with exactly this password, Excel assumes that there is no password, and opens the document transparently. As a result, the document content is encrypted and hidden from normal analysis, but opening it will execute the shellcode without further prompting.

The workbooks exploit the CVE-2012-0158 vulnerability, which triggers the execution of shellcode within the document.

After the workbooks are opened, the intended operation is to open a decoy workbook – a clean file that grabs the attention of the user while malicious activities proceed in the background. The themes of the decoys give us some idea as to the areas of interest of the target audience of this malware distribution.

Workbook 1

Filename: 300.xls (rough translation: ‘300 petitioners cosigned.xls’)

File size: 839756 bytes

SHA1: 066998e20ad44bc5f1ca075a3fb33f1619dd6313

MD5: 5c370923119f66e64a5f9accdd3d5fb

This does not display any decoy document, just closes the Excel window. Nevertheless, the shellcode execution proceeds.

If the file was opened, it would display a workbook with a list of names, gender, region and phone numbers of Chinese individuals.

Decoy content for 066998e20ad44bc5f1ca075a3fb33f1619dd6313.

Figure 1. Decoy content for 066998e20ad44bc5f1ca075a3fb33f1619dd6313.

Workbook 2

Filename: sample.xls

File size: 638912 bytes

SHA1: e5e183e074d26416d7e6adfb14a80fce6d9b15c2

MD5: 2066462274ed6f6a22d8275bd5b1da2b

Decoy content for e5e183e074d26416d7e6adfb14a80fce6d9b15c2.

Figure 2. Decoy content for e5e183e074d26416d7e6adfb14a80fce6d9b15c2.

Workbook 3


File size: 638912 bytes

SHA1: d80b527df018ff46d5d93c44a2a276c03cd43928

MD5: 80857a5541b5804895724c5d42abd48f

This decoy workbook contains information about key officials in the Philippines Department of National Defense (DND).

Decoy content for d80b527df018ff46d5d93c44a2a276c03cd43928.

Figure 3. Decoy content for d80b527df018ff46d5d93c44a2a276c03cd43928.

In the rest of this article, unless specified otherwise, we refer to the operation resulting from infection via Workbook 1 – but the overall operations (dropped filenames, registry keys, backdoor functions) are the same in each case.

When mining our sample collection for related samples we were able to spot other examples – however, in these instances the initial dropper was not available for our analysis, only the temporary dropper executables or the final payloads could be located. In these cases we don’t have complete information about the system infection, but it is safe to assume that similar exploitation schemes were utilized.


The shellcode features an interesting anti-debugging trick that I have come across quite regularly in APT samples lately. Most of the Windows API functions are resolved and called normally, but some of the critical ones (such as WinExec and CreateFile) are not entered at the entry address (as stored in the kernel32.dll export table), but five bytes after it instead. These functions are responsible for the most critical operations of the code (dropping the payload executable and executing it), which would reveal unusual activity in the scope of an ordinary Excel process.

As most tracers and debuggers would place the breakpoint or hijack function right at the entry of the API function, skipping the first few bytes is a good way to avoid API tracing and debugging.

Anti-tracing trick.

Figure 4. Anti-tracing trick.

(Click here to view a larger version of Figure 4.)

The same happens with WriteFile and GlobalAlloc, but this time, depending on whether or not there is a call right at the entry of the function, the displacement will be either five or seven bytes.

Anti-tracing hook initialization.

Figure 5. Anti-tracing hook initialization.

As a result of the functions not being entered at their usual entry points, the first few instructions are missed. As these are still essential for the stack management, the code is compensated within the shellcode, where a standard function prologue (stack frame creation push ebp, move ebp,esp) is executed.

For system functions compiled with standard compilers, the first few instructions are fixed on the entry point, but anything after that can’t be taken for granted. The shellcode can’t enter further than five or seven bytes into the API function, otherwise it could end up in the middle of a multi byte instruction, easily crashing the application.

Anti-tracing used in practice.

Figure 6. Anti-tracing used in practice.

In order to extract the embedded executable, the shellcode needs to find the carrier workbook. It does this using the fact that, at the time of the exploitation, the workbook must remain open in Excel. The code enumerates all possible handles and tries to call GetFileSize on each of them. If the function fails, because the handle does not belong to an open file (it could belong to many other objects such as directory, thread, event or registry key), or the file size is smaller than the expected size of the workbook (minus the appended encrypted EXE), 1de10h bytes, it skips to the next handle value.

Next, it reads four bytes from offset 0x1de00; the value found there should be equal to the size of the carrier workbook (this time including the appended EXE).

At this position, in the appended content following the OLE2 document structure, a short header is stored that contains the full carrier workbook size, the embedded EXE size and the embedded decoy workbook size. These values are used by the shellcode. The encrypted EXE content follows.

Organizing the code and structure in this manner makes the carrier/dropper workbook component and the dropped payload executable completely independent – it is possible to replace the payload with a new variant without changing a bit in the carrier encrypted workbook.

Appended header and payload.

Figure 7. Appended header and payload.

Once the hosting workbook is found, the code proceeds with decoding the embedded executable (using a one byte XOR algorithm with running key plus an additional one byte XOR with a fixed key), saving it to a file named ‘Winword.exe’ in the %TEMP% directory, then executing it. At this point, the decoy workbook content is dropped (using the same algorithm: one byte XOR with running key plus one byte XOR with fixed key, only this key differs from the one used in decoding the EXE).

Temporary dropper

This file is the dropper and installer for the final payload. It has an initial anti-debug layer.

The address of the GetVersion function is patched in the import table, to contain an internal function virtual address instead of an imported function address, which is normally expected at that position. The code around the entry point uses the stored value to redirect execution:

mov  large fs:0, esp
sub  esp, 58h
push ebx
push esi
push edi
mov  [ebp-18h], esp
call ds:dword_41A188

The execution actually goes to the address stored at dword_41A188, which is the memory location 00402440.

The program has only one export, LoadLibrary, thus when the operating system loads the program and resolves the external dependencies, this value, stored within the import table region, remains intact. The trick completely fools IDA Pro, which can’t be convinced that the location is an internal position and not an external import. This makes static analysis a bit more complicated. The necessary imported function addresses are later resolved dynamically by the initialization code of the dropper.

The major procedures of the dropper program are not called directly; instead, the trojan builds a function pointer table, and calls to procedures are performed via indexing into this table, as shown in Figure 8.

Building the function pointer table.

Figure 8. Building the function pointer table.

The key procedures are identified by having the following instruction sequence near the prologue:

push ebp
mov  ebp, esp
push eax
mov  eax, 12547908h
pop  eax

The value stored in the EAX register is a combination of two elements: 1254 is the marker; 7908 is the numeric ID for the function.

The entry is located by searching backwards for the standard prologue:

push ebp
mov  ebp, esp

The procedures are later invoked by calling indexes from the function pointer table (see Figure 9).

Using the function pointer table.

Figure 9. Using the function pointer table.

Winword.exe normally drops three major components into the system:

  • %PROGRAM FILES%\Common Files\ODBC\AppMgmt.dll – the final payload (Windows DLL file)

  • %PROGRAM FILES%\Common Files\DBEngin.EXE – a copy of rundll32.exe (a clean Windows system file, used for executing the payload)

  • %PROGRAM FILES%\Common Files\WUAUCTL.EXE – another rundll32.exe (a clean Windows system file, used for executing the payload).

Additionally, two registry export files named jus*.tmp (with a random number added after jus) are dropped into %TEMP%. These are the old and new hives of the HKLM\SYSTEM\CurrentControlSet\Services\AppMgmt registry location – a location at which the trojan registers itself in order to execute automatically upon each system boot. Saving the hives to a file makes it possible to modify the registry in one shot using RegRestoreKey.

Also dropped is a 301,445 byte long jus*.tmp file, which is a CAB archive containing the payload DLL.

The execution flow takes a different route if the presence of running security products is detected. The following process names are checked: KVMonXP.exe, RavMonD.exe, RsTray.exe, ccsvchst.exe, QQPCTray.exe, zhudongfangyu.exe, 360sd.exe, 360Tray.exe, zatray.exe, bdagent.exe, ksafetray.exe, kxetray.exe and avp.exe. However, not all of the security processes are checked at the same time – only a couple of selected ones are checked before each major operation.

As an example, if zatray.exe, RsTray.exe or RavMonD.exe is running, then AppMgmt.dll is not dropped and instead, the 400MB vbstdcomm.nls is created (the large size is due to an enormous amount of junk appended at the end of the file). Finally, a VBScript file is created and executed with the help of a dropped copy of wscript.exe (both files are saved to the %TEMP% folder, as lgt*.tmp.vbs and lgt*.tmp.exe, respectively). An encrypted copy of Winword.exe is created in %CommonProgramFiles%\ODBC\odbc.txt, using a one byte XOR algorithm with key 0xCC. Vbstdcomm.nls, which serves as a backup installer, takes the encrypted copy of Winword.exe, decodes it and simply executes.

The dropper registers AppMgmt.dll as a service. This is not achieved by creating a new service entry, rather by taking over the role of an already installed service, AppMgmt, redirecting the service DLL from the clean library to the dropped malware payload:

HKLM\SYSTEM\CurrentControlSet\Services\AppMgmt\Parameters: ServiceDll

%SystemRoot%\System32\appmgmts.dll -> C:\Program Files\Common Files\ODBC\AppMgmt.dll

In addition, the start up mode is changed from auto to demand in the location:

HKLM\SYSTEM\CurrentControlSet\Services\AppMgmt: Start

Then it changes the error control settings in the registry key HKLM\SYSTEM\CurrentControlSet\Services\AppMgmt:ErrorControl from normal (this would mean that if the driver fails to load, the start up process proceeds, but a warning is displayed) to ignore (in this case if the driver fails to load, start up proceeds, and no warning is displayed). The change is designed to avoid raising suspicion, should start up fail for any reason.

Finally, it executes the dropped DLL by executing net start AppMgmt.


We have identified five different versions of the final payload. Two of them were replicated from the exploited workbooks detailed earlier; the other three were found when we were digging through our sample collection searching for samples with similar characteristics.

The main characteristics of the five variants are summarized in Table 1 shown below (detailed descriptions of the columns are provided later in this section).

(Click here to view a larger version of Table 1.)

This DLL is built from the LAME MP3 encoder source [2]. The full library has been compiled, and in addition, a couple of malicious exports have been added to the code: lame_set_out_sample and lame_get_out_sample.

Additional malicious imports.

Figure 10. Additional malicious imports.

Note that the names of the additional exports are strikingly similar to the legitimate exports, lame_set_out_samplerate and lame_get_out_samplerate, which are present in the LAME source – thus it is not very obvious that the additional exports belong to something completely different.

One of the extra exports, lame_get_out_sample, is missing from newer versions of the malware. However, the function that would invoke this export is still present in the code. Clearly, the code was not cleaned up properly when the export was removed.

The backdoor contains many encrypted strings, one of which serves as an internal version number. In Table 1 we list the version numbers as they appear in the code. Collected information suggests that the most widely distributed was version 2.3(UDP), making its rounds in the wild in early December 2012.

Table 1 also lists the date when we first saw each particular backdoor variant – either arriving in our collection, reported in cloud look ups or seen elsewhere on the Internet. Additionally, the compilation date is listed, as taken from the PE header.

An interesting quirk comes from the usage of the LAME source: one of the original source functions, beVersion(), inserts the compilation date into the data section of the executable.

Compilation date in code.

Figure 11. Compilation date in code.

This provides an independent method of determining the creation date of the variant aside from the PE time stamp. There was no trick, however – the two dates matched in all cases.

It is notable that there is always a large gap between the compilation date and the date of the first observation of each variant. There are several possible reasons for this:

  • Small-scale targeted attacks don’t provide much telemetry information; the smaller the number of targets, the slimmer our chances of finding out about their infection.

  • The trojan looks very similar to a real LAME encoder library; infected victims are reluctant to submit it for analysis.

  • There may be an intentional delay (some sort of testing period) in the release process of the malware.

The backdoor uses different approaches for handling C&C communication. Earlier versions used the standard Windows socket communication functions (send, recv) to exchange data with the C&C server. The newer versions linked the UDT data transfer library (available from for communication. The versioning of the variants suggests that some time around March 2012, the code forked into a socket communication branch (TCP) and a UDT powered communication branch (UDP).

The backdoor features all the basic functionality that is expected from a piece of malware of its class. It is able to:

  • Create screenshots

  • Get drive type (FAT, FAT32, NTFS, CDFS) and free space

  • Enumerate files and directories and send the list to the server

  • Rename files

  • Create directories

  • Delete files.

The last character of the ModuleFileName (without extension) is checked on execution: if it is not of one of the expected values – ‘T’, ‘t’ (executed via net.exe), ‘R’, ‘r’, ‘N’, ‘n’ (executed via DBEngin.EXE), ‘2’ (rundll32.exe), ‘L’ or ‘l’ – it builds and injects a simple piece of code to load AppMgmt.dll properly.

For this purpose, it creates a new suspended process (with command line: c:\windows\system32\svchost.exe), calls GetThreadContext on it, and gets EAX from the CONTEXT structure, using the fact that in the case of a suspended process the EAX register always points to the entry point of the process. Then it writes the starter code to this entry point and resumes the thread. The suspended thread is not visible in the process list at that point. This way, the trojan can escape analysis, if not executed in a natural form, and still execute.

Configuration data is stored in a file named DbTrans.db, XOR encrypted with key 0x58.

The string constants (API names, DLL names, process names) are all stored in encrypted form using a strong encryption algorithm. The strings are stored aligned (Unicode strings to 0x90 bytes, ASCII strings to 0x38 bytes boundary), decrypted in eight byte chunks using the DES ECB algorithm, and referenced by IDs that index into this name pool. The encrypted strings contain padding bytes at the end, where zeros are encoded.

The strings are decrypted on the fly before being used and filled with zeros after use. This way there are no visible strings in the memory that would give away more information about the internals of the backdoor.

There are three nearly identical encryption functions (and accompanying encrypted string tables and encryption keys) in all variants: one is for the Unicode strings, one for the ordinary ASCII constants, and a third one for the Windows API function names (also stored as ASCII strings) that are used in the code. We found that only the encryption keys were different for the three cases. The following key seeds remain the same throughout the variants:

For ASCII strings: 82 C5 D3 59 2B 38 00 00

For Unicode strings: 5E 97 CC 42 8E CD 00 00

For API function names: 5B 5F CB 8D E5 F5 00 00

In the last version, the two ASCII functions are merged into a single function.

The C&C addresses are hard coded into the backdoor, and protected with a simple byte wise XOR (key:0x58) encryption. This is an interesting choice, given that all other string constants are protected with a string DES algorithm – perhaps the server addresses are changed more frequently (indeed, there is a minimal overlap between the different versions’ C&C addresses) than the authors are comfortable with re-encrypting the strings – but no evidence was found for it in the few samples we have found.

The string constants of the code are referenced by IDs and decrypted on the fly. However, there are strings that are never used in the code. These could belong to an earlier or internal version, and simply have not been cleaned up from the string pool, as illustrated in this example:

push 9   ; ,lame_set_out_sample
call Get_String_A
push 0Ah ; ,
call Get_String_A
push 1Eh ; DBEngin.exe
call Get_String_A
push 8   ; EXPL.EXE
pop  eax
call Get_String_W
push offset s_expl_exe
push [ebp+var_254]
call StrCpyW
push 8
pop  eax
xor  ecx, ecx
call set_mem
push ebx
push 2
call CreateToolhelp32Snapshot

Some of these strings could be internal configuration options for the development environment (I suspect these are access details to an internal server):


Other strings provide status information about the current operation of the backdoor:

Client RecvData Complete
A File Search Task has start already !!!
File Search Task Success
File Search Task Failed, Please Check
Upload Client Failed
Upload Client Success
Delete File Success
Delete File Failed 
Rename File Success
Rename File Failed
Create Folder Success
Create Folder Failed

A few constants indicate undocumented or debug functionality:



When looking into APT attack scenarios, one has to be extra careful. Often we see that clean programs and libraries are dropped onto systems to hide the operation of malicious applications [3]. But sometimes, what looks to be a genuine MP3 encoder library, and even works as a functional encoder, actually hides malicious additions buried deep in a large pile of clean code. One has to be very thorough when it comes to targeted attacks, and one cannot afford to make any assumptions.


[1] Baccas, P. When is a password not a password? When Excel sees “VelvetSweatshop”.

[2] LAME (Lame Aint an MP3 Encoder).

[3] Szappanos, G. Targeted malware attack piggybacks on Nvidia digital signature.



Latest articles:

VB2018 paper: Unpacking the packed unpacker: reversing an Android anti-analysis native library

This paper analyses one of the most interesting anti-analysis native libraries we’ve seen in the Android ecosystem. No previous references to this library have been found.The anti-analysis library is named ‘WeddingCake’ because it has lots of layers.

VB2018 paper: Draw me like one of your French APTs – expanding our descriptive palette for cyber threat actors

When it comes to the descriptive study of digital adversaries, we’ve proven far less than poets. Currently, our understanding is stated in binary terms: ‘is the actor sophisticated or not?’. Juan Andres Guerrero-Saade puts forward his views on how we…

VB2018 paper: Office bugs on the rise

It has never been easier to attack Office vulnerabilities than it is nowadays. In this paper Gabor Szappanos looks more deeply into the dramatic changes that have happened in the past 12 months in the Office exploit scene.

VB2018 paper: Tracking Mirai variants

Mirai, the infamous DDoS botnet family known for its great destructive power, was made open source soon after being found by MalwareMustDie in August 2016, which led to a proliferation of Mirai variant botnets. This paper presents a set of Mirai…

VB2018 paper: Hide’n’Seek: an adaptive peer-to-peer IoT botnet

This paper presents a thorough analysis of the inner workings of Hide’n’Seek, a peer-to-peer IoT botnet discovered in January 2018. With an exploit table that can be updated in memory and modular in its approach, Hide’n’Seek gives us a glimpse of…

Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.