Run your malicious VBA macros anywhere!

Kurt Natvig

Independent researcher, https://LibNotFound.com


 

Introduction

Obfuscation is an old trick every malware researcher and scanner engine needs to get around in order to find the real content of the sample they are analysing. The type and level of obfuscation varies, but in general, the idea is to make it difficult to understand what a sample is really doing – which can reduce the accuracy in correctly handling it.

Office documents have over many decades been used to launch malware, often through macros, embedded content or exploits. Embedded ‘executable’ content is usually very visible, and with most exploits, even if you don’t know exactly what is being exploited, the presence of strange data in strange locations is usually a good giveaway that something is going on. The same is true for hand-crafted RTFs with lots of obfuscation – they just shine in the dark.

I wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway and thus reveal the sample’s true nature in a safe manner. Documents do have some privacy concerns, and being able to carry out a full analysis of any (malicious) document on e.g. an email server inline with something that is light, accurate, inexpensive and flexible could help improve the accuracy and time taken to make decisions. Regular sandbox solutions that require Windows, Office, monitoring agents and quite a bit of hardware are neither light nor inexpensive.

 

The goal of this article

My goal is to recompile malicious VBA macro code to valid harmless Python 3.x code. The generated Python 3.x version will just report what is happening, not perform the malicious actions – with the exception maybe of performing downloads to retrieve data (while it’s there, and you might want to re-run it later).

Converting VBA to Python started as an idea, and after putting numerous hours into this project (vba2python) I’ve learned some lessons that I wanted to share with my fellow researchers.

There are three main steps involved in creating such a tool:

  1. Extracting the content needed to recompile the code correctly.
    • VBA source code.
    • Elements from the document (Word, Excel, etc.) that is referenced by the VBA macro code (or access to the various streams directly).
  2. Automatically generating good Python 3.x code based on the VBA code and data from the streams.
  3. Providing an application-world to the generated code that makes the VBA API and object-model fit into the Python world.

 

Gathering the data needed to understand the VBA world

Office documents can be stored in many physical forms, and these forms can also be embedded in numerous other physical forms. For instance, an email message file can be an MHTML object, that again contains ActiveMime encoded data, which finally reveals the OLE2 document file inside.

There are many public tools out there that can provide this data for you. Once you get down to the actual document there are also numerous tools that can extract the VBA source code, but I haven’t seen many tools that can provide the cell/document access needed for a lot of samples. I have seen Python packages that can do this directly with OOXML, and maybe there are other packages that can do this directly with OLE2 containers too.

 

Generating the Python 3.x code

This is the hardest part. If you are familiar with VBA and Python, you may think there are many similarities. Once you are faced with one line at a time, sequence, dependency, class initialization, VBA-only features, conflicts and such – you run into a lot of problems at once. Don’t let this deter you.

Let me start to illustrate this with a simple sample: 000475fc6e6705bbc5ebad8cc3af23c6a44b6ab7.

VBA-fig1.png 

This is a very simple sample and it wouldn’t take a lot of time to convert it manually to Python 3.x. As you’ll find out, manual conversion and automatic conversion are two completely different things, but you need to start somewhere.

There are no arrays, complex equations, predefined variables or classes, divides that cause incompatibility with Python, etc. Here you see some variables being defined, two objects being created and used (download and store), and at the end something is being executed via Shell.

When this is auto-converted to Python 3.x it looks like this:

VBA-fig2.png 

When this code runs, it produces the following output:

VBA-fig3.png 

As you can see, it’s a simple downloader – but you saw that already with the initial VBA macro code as it wasn’t obfuscated much. This was just to get warm. The output of a sample is just the application-world printing out the behaviour it wants to report, while it acts as the Office world for the sample.

Sample 2 (e4debf873d683a51626882ba69364b54e5881799) will let us start removing obfuscation. The Workbook_Open macro of this sample starts like this:

VBA-fig4.png 

As you can clearly see, the Select Case statements look a bit funky (I had to read them a few times before I realized what it was trying to do), but if you take a closer look at the variable the select is from (m222371a95aa9d8), it’s initially set to 3 – and this ‘Case’ is the only one you need. Of course, you don’t know if ‘that is always the case’ so you port all the code to Python – always. This is just done to confuse an algorithm or human.

VBA-fig5.png 

Case 3 just creates a specific object based on an encrypted string – decrypted via function rd165a9f386b4b. Once this object is created, it wants to execute the Exec method of the object. To find out what it wants to execute, it spawns the same decryption function (rd165a9f386b4b) with data from a specific Excel cell:

index,name,row,col,value
1,ZAOIQ,6,134,"8281897784857A777E7E40778A77323F897B8076818985868B7E77327A7B76767780323F8081828481787B7E77323F578
A777587867B818062817E7B758B3264777F818677657B798077761F1C78878075867B818032804645787877328D1F1C827384737F3A367A7
3467878743B1F1C3685454548......

To find the cell information you need to enumerate the Workbook stream and look for records like:

• Formula: to get the parsed expressions of code running.

• SST/extSST: to find strings and their locations in the sheets.

• LabelSst/Lbl: to find labels used in Formula parsed expressions.

• Dimension handler: to find the sheet dimensions used.

• Rk and MulRk: to find integers and floats and their locations in the sheets.

After all these are parsed you will have a good map which is provided via the Excel object-model to the VBA/XF code.

Once it gets the data (above) it calls the decryption function:

VBA-fig6.png 

This is nothing fancy: it reads two characters at a time and converts them to integers so they can be manipulated and then converted back to characters and appended to the destination string. The beautiful consequence of converting the code and running it is that you don’t really care what it does or how it does it, you want to know the effect of it.

Once the entire VBA macro is converted to Python 3.x and run, you get the following output:

VBA-fig7.png 

The object it wanted to create was Wscript.Shell, and the .Exec method was spawning a PowerShell script – which also has its own encryption. Sample 3 (ddcbcf91d98ac04ffbc90ff597bab6263c69eded) again raises some issues when you want to convert the code automagically to Python. This time it looks like there is a lot of data waiting to be decrypted – but it’s not there. Once again, this is to confuse humans and algorithms trying to decode or x-ray ‘data’.

VBA-fig8.png 

You’ll see a lot of variables being set to ‘random’ data, which you might assume will be decrypted at some point. Instead, a function, KC_U, is invoked further into the Workbook_Open macro, which looks like this:

VBA-fig9.png 

There are two main challenges here:

  1. GoTo doesn’t exist in Python, but the Python universe seems unlimited and someone has written a nice goto package [1], which I decided to test. However, there’s one problem as this package patches the byte code of the function: it doesn’t seem to ‘see’ beyond the VBA ‘Exit Sub’, which normally would be translated as ‘return’.

    As I automatically rewrite the code to Python 3.x, I modify the code so the exit would always be at the end, thus solving that problem.

  2. The raw data that is needed as input to the decryption comes from the Workbook sheet, as a comment to a cell. The raw data looks like this (TxO):

    VBA-fig10.png
    A TxO record in the Workbook stream seems to follow an MsoDrawing object, and the Obj record describing this uses type 0x19 (Note) to Obj 1.

    VBA-fig11.png

 In the Python 3 world, the function KC_U will look like this (with the @goto support):

 VBA-fig12.png

When, at the end, we run the generated Python 3 code, we get the behaviour of the VBA macro spelled out:

VBA-fig13.png 

Sample 4 (f5858eb5772eba0b6c066aebdd1efbdefed71a6a) is probably the most complex sample to convert automatically that I’ve seen so far. I show a lot of the converted code at [2]. I also wrote a blog post about sample 5 (6cd67f6ce51c3a57f5d9a65415780ee8ef9ee44c) [3], which leads me on to the application world that is needed to support the converted Python code. As you see, there are lots of references to the Office VBA world, and we need to replicate that so that the code works.

 

The VBA application world

As you’ve seen in my generated Python code, I need to create something that resembles the VBA object-model so the VBA API fits well with the Python world. This means generating an application object for Excel or Word that can provide the support needed to access cell information, document paragraph data, etc. Each sheet within the Excel document also needs to be created, which again supports what is needed for those objects. UserForms objects and variables in VBA macros that are initialized to values need to be initialized at the right time before use so the VBA macro can use the data as ‘normal’.

In addition to all of this, regular simple built-in functions need to be exported, such as:

Len, StrConv, Left, Right, InStsrRev, Replace, DoEvents, LBound, UBound, Now, TimeSerial, Environ, Close, ChDir, MkDir, Shell, CreateObject, Asc, Int, Chr, Mid, Out, CallByName etc.

You’ll also need to support used constants, but these are easy to find and export to the generated code.

None of them are hard to write. CreateObject needs to make a new class based on the name the potential malware wants (e.g. Wscript.Shell, Scripting.FileSystemObject, Microsoft.XMLHTTP, Adodb.Stream, etc.). These objects need to deliver methods the sample can use, like:

VBA-fig14.png 

Before the real classes are defined for the VBA macro streams, Python needs to know about them for the first pass (it doesn’t have to understand what they are, just know that they are there) and UserForm classes (if applicable) need to be created and initialized. This is an example of a complete rewritten simple VBA macro in Python 3.x form:

VBA-fig15.png 

 

Conclusions

After quite a few hours spent on this ‘fun’ project I’ve learned a lot of lessons. Languages are complicated and moving the same logic from one language to another can’t be done in a hurry.

Let me run through a few of the challenges:

  • Arrays in VBA aren’t indexed with ‘[’ – you’ll need to figure out what variable is being referenced and its size to determine if a ‘[’ is needed for the Python world.

  • Calculations that VBA handles fine as double/floats even though they are stored in Long will cause problems in Python when you want to slice something based on a double/float. You’ll need to find the right time and place to convert it to an int(). Not too soon, as it might affect the calculation/result (which could cause out-of-buffer access), and not too late.

  • Calling subroutines and certain APIs in VBA doesn’t require ‘()’ around parameters – you’ll need to figure out what is what.

  • Referencing local variables when they are in a Python class means some ‘.self’ references need to be inserted so you always reference the right object. You also need to make sure to declare global variables, so you make sure e.g. UserForm access is via the same expected object.

  • I wrote two tokenizers for each line in order to handle ‘complicated’ expressions, e.g. is this a function call, and if so, where do we insert the ‘()’ for the parameters?
    VBA-fig16.png
    •  Split each element of a line into separate tokens, for example:
['CreateObject', '(', 'self.rd165a9f386b4b', '(', '"696575847B828640657A777E7E"', ')', ')', '.', 'Exec', 
'self.rd165a9f386b4b', '(', 'ThisWorkbook', '.', 'Sheets', '(', '"ZAOIQ"', ')', '.', 'Range', '(', '"G135"',
')', '.', 'Value', ')'
    • Group all tokens into logical elements/units, for example:
['CreateObject( self.rd165a9f386b4b( "696575847B828640657A777E7E")) . Exec', 
'self.rd165a9f386b4b(ThisWorkbook.Sheets("ZAOIQ").Range("G135").Value)']
  • I wrote a common analyser for each line to determine what the ‘overall’ purpose is, then to send it to a concrete handler that could replace that purpose with one or more lines for the desired language. For instance, ‘For’ needs to be tailored to the specific use-case, as well as ‘Select’ and ‘Use’.

  • Some lines need to be completely rewritten and for that reason I wrote a rule engine to recognize the challenges and convert the code to the desired output (e.g. VBA lets you find a file number and open a file via it – not very Python-like). Good thing this isn’t asm, where you can easily inspect the bytecode as you run it to find out if everything is ok.

  • VBA allows strings to be appended by ‘&’, whereas the Python string class doesn’t.

  • VBA allows ‘If something = 1 Then’ which Python doesn’t appreciate. But again, ‘a == 1+2’ doesn’t set a to a value, it returns a Boolean state, so a blind replace doesn’t work well.

  • VBA uses ‘&H’ for hex characters in a string, not ‘0x’.

  • VBA has a keyword for Xor, not ^.

  • Exception handling (like On Error Resume Next) needs to be handled with try/except – but on logical units – like the entire ‘if/else and body’ so it resumes on the next logical line.

  • VBA does not require indentation, so the output needs to match Python’s expectations. The good thing is that VBA does use End If, EndW, Next and such for control, so it’s relatively easy to understand when a change of indentation is needed. And of course, add a ‘:’ when you do.

  • VBA declares all variables with Dim and often type. This is useful if you get the type which helps you understand if data needs to be converted to fit the variable. Python needs some other types, especially with arrays of bytes.

I’ve also seen samples that have a very short VBA macro which then continues with XF:

Private Sub Auto_Open()
Application.Run Sheets("Brisk").Range("CD5")
End Sub

Application.Run is a call to the application-world to run XF code from the sheet ‘Brisk’ from the ‘CD5’ location. This means the ‘Run’ function will need to translate the XF code to Pyhthon as well – and this will be the next project.

These lessons learned count for many of the issues faced, and the rest is pain as you go – but the fact that the initial results (and speed, a few milliseconds) are all that is needed to run malicious VBA macros on any platform gives me confidence that this could be useful for many situations and is worth the hours spent.

 

References

[1] https://pypi.org/project/goto-statement/.

[2] https://libnotfound.com/2021/03/24/automatically-generate-python-3-x-from-malicious-vba-macros/.

[3] https://libnotfound.com/2021/03/10/running-vba-as-python-part-2/.

 

 

Download PDF

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Dissecting the design and vulnerabilities in AZORult C&C panels

Aditya K Sood looks at the command-and-control (C&C) design of the AZORult malware, discussing his team's findings related to the C&C design and some security issues they identified during the research.

Excel Formula/Macro in .xlsb?

Excel Formula, or XLM – does it ever stop giving pain to researchers? Kurt Natvig takes us through his analysis of a new sample using the xlsb file format.

Decompiling Excel Formula (XF) 4.0 malware

Office malware has been around for a long time, but until recently Excel Formula (XF) 4.0 was not something researcher Kurt Natvig was very familiar with. In this article Kurt allows us to learn with him as he takes a deeper look at XF 4.0.


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.