Detecting malicious documents with combined static and dynamic analysis

Carsten Willems University of Mannheim
Thorsten Holz University of Mannheim
Markus Engelberth University of Mannheim

download slides (PDF)

Malicious documents, i.e. documents that contain a malicious payload (e.g. a keylogger) became a serious threat recently. This attack vector is often used in targeted attacks against government, military and other high-profile targets.

For example, the attacks against European governments and US defence organisations in 2007 and 2008 were based on malicious Microsoft Office documents. Recently, targeted attacks involving malicious Acrobat Reader documents that contained a zero-day exploit were reported. In this talk, we present our approach for detecting potential security risks that come with arbitrary data (=non-executable) in documents. We achieve this by a novel combination of static scanning and dynamic behaviour-based analysis techniques.

The scanning technique allows us to find malicious artifacts in files, such as embedded PE32 files, known exploit code, or similar anomalies in the document. The dynamic approach opens the to-be-analysed document in its associated application (for example, .doc files would be opened in Microsoft Word and .pdf files would be opened with Acrobat Reader). Since some exploits only trigger in particular application versions, we use many different instances of these client applications in parallel. The monitoring is carried out in a 'sandbox' environment that allows us to observe suspicious actions that may be happen when opening the document. This can, for example, be the creation of files, spawning of processes, outgoing network connections, interference with other processes, or in the extreme case crashing of the client application (e.g. due to a wrong offset in the malicious document's exploit code). During the analysis phase, we also emulate the typical behaviour of a human to potentially trigger the exploit code.

Finally, we combine all of the results from the static scanning and the dynamic analysis phase. This enables us to decide if the specific document is potentially harmful, actually harmful, or probably harmless. Our experiments with thousands of documents show that our system has a detection rate of 100% for all malicious documents which we tested, and also a 0% rate of false positives.

In future work, we plan to tightly integrate our analysis suite with common mail servers in order to automatically verify all attachments of incoming mails before they are relayed to their final destination. Such a tool can then protect high-profile targets from targeted attacks that use malicious documents as an attack vector.