Not so random

2011-07-01

Raul Alvarez

Fortinet, Canada
Editor: Helen Martin

Abstract

Pseudorandom generators are increasingly becoming an integral component of modern malware. Raul Alvarez shows how Conficker uses a pseudorandom generator to produce random domain names while retaining its ability to communicate with the Command and Control (C&C) server.


The cat-and-mouse chase between the takedown of botnet Command and Control (C&C) servers and malware that incorporates self-updating technology stepped up a gear when malware started to generate pseudorandom domain names.

A few years ago, botnets updated themselves through static IP addresses coded deep within them, or domain names encrypted within their core. But anti-malware researchers soon became able to determine which IP addresses or domain names are used by a given piece of malware, thus leading the way for proactive takedowns, the closure and blocking of those addresses.

Now, however, malware is capable of creating pseudorandom domain names that are hard to track. The malware is able to update itself by employing a form of Monte Carlo simulation. A Monte Carlo simulation is a methodology that employs random numbers within a given set context.

A simple example is as follows:

We can randomly mark a dot on a sheet of paper. As long as the dot is marked on the paper we can predict the location of the dot. It is random in the sense that we don’t know the exact point at which the dot will land, but we do know the boundaries within which it is restricted.

Using the same concept, malware and its servers can create random domain names within a given border, thus allowing it to update itself while producing random domains.

This article will show how Conficker uses a pseudorandom generator to produce random domain names while retaining its ability to communicate with the Command and Control (C&C) server, and how the machines infected by Conficker can generate the same pseudorandom domain names in sync.

Conficker

We first saw Conficker spring into action a couple of years ago. Exploiting vulnerabilities, propagating through removable drives and jumping on network shares were some of the ways in which Conficker spread itself. This article focuses on the malware’s pseudorandom generation of domain names.

Is it time yet?

Before executing its domain name generation routine, Conficker checks if the infected machine has an Internet connection by calling the InternetGetConnectedState() API. If there is no Internet connectivity, it will sleep for one minute then check again. It will keep checking until it can establish a connection. Once it is successful, it will proceed to check the current date.

In this particular variant, Conficker checks for a certain date before proceeding to the subroutine of generating the domain names. The date checking starts with a call to the GetSystemTime() API, which returns the current system date and time expressed in Coordinated Universal Time (UTC). If the retrieved date falls before January 2009, it will sleep for three hours by creating a loop of 18 iterations and sleeping for 10 minutes for each iteration. After three hours it will be awakened to check the date again.

Planting the seed

When the right timing has been acquired (i.e. the date is later than January 2009), Conficker generates the starting point by calling the srand() function. The srand() function accepts one parameter, the seed, to set the starting point for generating a series of pseudorandom numbers.

To generate the seed, Conficker XORs all the resulting values from calls to the following APIs:

  • GetCurrentThreadId()

  • GetCurrentProcessId()

  • QueryPerformanceCounter()

  • GetTickCount()

The different seed values ensure that the pseudorandom number generator will generate a different succession of results in the subsequent calls to the rand() function. (A call to the rand() function generates a pseudorandom number.)

The initial random value

After setting the starting point of the pseudorandom generator, the first random number is retrieved by calling the rand() function and dividing the result by six. The resulting remainder from the division operation is then used to select from one of the following search engines: ‘baidu.com’, ‘google.com’, ‘yahoo.com’, ‘msn.com’, ‘ask.com’, and ‘w3.org’ (see Figure 1).

List of search engines used.

Figure 1. List of search engines used.

After adding the string ‘http://www’ to the selected search engine, another subroutine is executed. This subroutine starts by getting the user agent header string (containing information about compatibility, the browser, and the platform name) by calling the ObtainUserAgentString() API (see Figure 2).

User agent string – Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E).

Figure 2. User agent string – Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E).

The same header string is supplied as a parameter for a call to the InternetOpenA() API to initialize the use of the WinINet functions. (The WinINet API enables applications to access standard Internet protocols, such as FTP and HTTP [1].)

The selected search engine website, e.g. ‘www.yahoo.com’, is now opened via a call to the InternetOpenUrlA() API, which is immediately followed by a call to the HttpQueryInfoA() API with a query info flag of 0x20000013 (HTTP_QUERY_FLAG_NUMBER | HTTP_QUERY_URI). This flag identifies the specific location of the resource. Another call to HttpQueryInfoA() with a flag of 0x00000009 (HTTP_QUERY_DATE) retrieves the date and time at which the message was originated.

Blind date

The date and time information is the most important element in the creation of Conficker’s pseudorandom domain names. This information is used to determine the value that synchronizes the domain names generated by the infected machines and by the malware’s Command and Control (C&C) server.

To generate the initial value, Conficker extracts the date, month and year from the information gathered by HttpQueryInfoA() and stores them in memory in SYSTEMTIME format [2]; a quick call to the SystemTimeToFileTime() API changes the time to FILETIME format [3].

A series of computations involving the lower and higher four bytes of FILETIME is performed to generate a 64-bit value. This serves as the initial value for Conficker’s pseudorandom number generator. The malware does not use the rand() function to generate its domain names. The pseudorandom number generator is the most important element in order to synchronize the domain names produced by infected machines in the wild.

Let's generate

Before we proceed further, let’s look closely at Conficker’s pseudorandom generator. The following are the step by step instructions of the generator subroutine:

A typical entry on a given subroutine has the following commands to set up the stack:

55         push ebp
8B EC      mov ebp,esp
83 EC 20   sub esp,20h

The initial 64-bit value that we got from our previous calculations is stored in memory. Let’s call the upper 32-bit value MemLocHigh and the lower 32-bit value MemLocLow. The following codes copy the values to the ECX and EAX registers:

8B 0D 94 9D 3B 00      mov ecx,MemLocLow
A1 90 9D 3B 00         mov eax,MemLocHigh

There are four additional memory storages used to hold the temporary 64-bit values for the rest of the calculations. Let’s call them TempMem1, TempMem2, TempMem3 and TempMem4. There are also three memory variables used for 32-bit computation. Let’s call them memA, memB and memC. These variables and memory locations will be used by Conficker in the series of computations that follow.

TempMem1 is zeroed out and the contents of MemLocLow are copied to memA:

83 65 F8 00   and dword ptr [ebp+TempMem1],0
56            push esi
8B D1         mov edx,ecx
57            push edi
89 55 FC      mov memA,edx 

Conficker stores the value of ‘MemLocLow AND 7FFFFFFFh’ to memB, and TempMem2 now points to MemLocHigh.

BF FF FF FF 7F   mov edi,7FFFFFFFh
23 D7            and edx,edi
89 45 F0         mov dword ptr [ebp+TempMem2],eax 
89 55 F4         mov memB,edx

The following codes introduce the instruction FILD, one of the assembly instructions in the FPU (Floating-Point Unit) instruction set. There are eight 80-bit data registers in FPU that are arranged as a stack: ST0, ST1, ST2, … ST7.

ST0 contains the value at the top of the stack, which is used by the FPU instructions in their computation. FPU instructions are mostly ignored or skipped by anti-virus emulators – malicious programs often use this instruction as one of their anti-emulator tricks. The resulting values of these FPU instructions constitute the overall action of the malware. If the anti-virus software can’t properly process the FPU instructions, there is a big chance of missing the actual intent of the malware.

FILD (integer load) is used to convert the TempMem2 value to the 80-bit extended precision format and push the result to ST0:

DF 6D F0      fild [ebp+TempMem2]

Conficker ANDs the value of memA with 80000000h:

BE 00 00 00 80    mov esi,80000000h
21 75 FC          and memA,esi 

It converts the TempMem1 value to the 80-bit format and pushes the result to ST0, the original value of ST0 is now pushed down to ST1:

DF 6D F8 fild    [ebp+TempMem1]

It zeroes out the content of TempMem1 and memA now contains the result of MemLocLow AND 80000000h:

83 65 F8 00   and dword ptr [ebp+TempMem1],0
89 4D FC      mov memA,ecx
21 75 FC      and memA,esi 

FCHS (change sign) is another FPU instruction that changes the sign of ST0:

D9 E0         fchs

This is followed by the codes that use FADDP, the content of ST0 and ST1 is added and the result is placed into ST1. It also pops the content of ST0 out of the stack.

DE C1         faddp st(1),st

Conficker copies MemLocHigh to TempMem, copies MemLocLow to memC, and saves MemLocLow to the regular stack:

89 45 E8      mov dword ptr [ebp+TempMem],eax
89 4D EC      mov memC,ecx
51            push ecx

FSTP is used to store the value of ST0 to TempMem2 and pop the ST0 content out of the stack:

DD 5D F0      fstp [ebp+TempMem2]

Followed by the codes that show that Conficker keeps manipulating the values of MemLocHigh and MemLocLow.

51            push ecx
DF 6D E8      fild [ebp+TempMem3]
DF 6D F8      fild [ebp+TempMem1]
D9 E0         fchs
DE C1         faddp st(1),st

Conficker stores the value of ST0 to the regular stack and computes the sine of that value.

DD 1C 24         fstp RegStackPointer
E8 65 94 00 00   call MSVCRT.sin

After getting the sine of ST0, another series of FPU instructions are executed. At the end of the codes below, it gets the log of ST0:

83 C4 08         add esp,8
DD 5D E0         fstp [ebp+TempMem4]
83 65 F8 00      and dword ptr [ebp+TempMem1],0
89 55 FC         mov memA,edx
21 75 FC         and memA,esi 
23 D7            and edx,edi
89 45 E8         mov dword ptr [ebp+TempMem3],eax
89 55 EC         mov memC,edx
DF 6D E8         fild [ebp+TempMem3]
51               push ecx
DF 6D F8         fild [ebp+TempMem1]
51               push ecx
D9 E0            fchs
DE C1            faddp st(1),st
DC 45 E0         fadd [ebp+TempMem4]
DC 4D F0         fmul [ebp+TempMem2]
DC 4D F0         fmul [ebp+TempMem2]
DD 5D E0         fstp [ebp+TempMem4]
DD 45 F0         fld [ebp+TempMem2]
DD 1C 24         fstp RegStackPointer
E8 06 94 00 00   call MSVCRT.log

Finally, Conficker copies the value of ST0 to MemLocHigh and MemLocLow using the FSTP instruction. The return value at register EAX also contains the new MemLocHigh value.

59             pop ecx
59             pop ecx
5F             pop edi
DD 1D 90 9D 3B fstp MemLocHigh
A1 90 9D 3B 00 mov eax,MemLocHigh
5E             pop esi
C9             leave
C3             retn

The new values of the MemLocHigh and MemLocLow memory locations will now be supplied as the 64-bit value for the next execution of the pseudorandom generator.

Wrapping up

Conficker’s pseudorandom generator accepts a 64-bit value. It performs a calculation on this 64-bit value using FPU instructions such as FILD, FCHS, FADDP, FSTP and FMUL. These instructions use the special stack registers ST0, ST1, …, ST7. Conficker also uses the mathematical functions sine and log to produce a different numeric result.

After the long and tedious calculations, the end result is a new 64-bit value. This new 64-bit value is used as the input parameter for the next call to the pseudorandom generator.

The lower 32-bit value is stored in the EAX register, which is essential in the generation of the domain names.

Time to generate domain names

Conficker’s pseudorandom number generator is an important component in generating the pseudorandom domain names that are recognized by all Conficker-infected machines (of the same variant) and its C&C servers.

The actual domain name generating routine can be divided into three blocks of code (see Figure 4).

The first block of code, block A, sets up the counter for creating 250 (number varies by variant) domain names. Each domain name is stored in a memory location generated by a call to the GlobalAlloc() API.

The second block of code, block B, starts by calling Conficker’s pseudorandom generator routine. The resulting EAX value from the routine is converted by the CDQ instruction to quad word in EDX:EAX via sign extension. (For example: if EAX = 0 or positive, EDX will be 0000 0000; otherwise if EAX is negative, EDX will be 0xFFFFFFFF.)

PUSH 4, POP ECX AND IDIV ECX divides the value in EDX:EAX by four, yielding the remainder in EDX. The possible values for the remainder in EDX range from -3 to 3. Adding eight to the remainder gives us the number of characters to be generated for the new domain name.

The resulting EAX from a call to the pseudorandom generator is converted to its absolute value by calling the labs() API (which calculates the absolute value of a long integer). The value is now divided by 0x1A (26 in decimal), to determine which letter of the alphabet has been selected; adding 0x61 to the value transforms it to hexadecimal code representing the lower case equivalent of the letter.

The JMP instruction creates the loop that generates the pre-computed number of lower case letters for the domain name.

The third block of code, block C, ANDs the value of EAX from a call to the pseudorandom generator by seven. It effectively selects the TLD (top-level domain) suffix from one of the following: .cc, .cn, .ws, .com, .net, .org, .info and .biz (see Figure 3). The selected TLD suffix is now appended to the domain name generated from block B.

To summarize, in this Conficker variant, 250 domain names will be generated. Each domain name consists of lower case letters of the alphabet that range from five to 11 characters with the TLD suffix taken from the eight possible TLD strings. Note that each call to the pseudorandom generator produces a new 64-bit value that acts as the new input for the same routine.

TLD strings.

Figure 3. TLD strings.

Blocks of code for the domain name generation.

Figure 4. Blocks of code for the domain name generation.

Random domain names generated by C.onficker (some letters intentionally erased).

Figure 5. Random domain names generated by C.onficker (some letters intentionally erased).

On an ending note

Pseudorandom generators are increasingly becoming an integral component of modern malware, not just for generating random domain names. Given this ability, Conficker proves to us that if an anti-virus system is not capable of emulating FPU instructions, it will be left behind. Other Conficker variants have slight variations on their pseudorandom generator, yet the same idea remains.

Conficker synchronizes its generated domain names with other infected machines and C&C servers by using the date and time taken from a randomly selected search engine website.

In addition, we have recently seen domain name generation in the Licat file infector, the Srizbi trojan [4], and some phishing-capable trojans. The common denominator between Conficker and these pieces of malware is the use of the current date and time for synchronization; the use of random domain names will only be successful if they can also be generated by their C&C servers.

They are out there. Hundreds of pieces of malware with domain name generation capability are around, and there are more to come. The question is: can we catch up?

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.