A fast and precise malicious PDF filter

Wei Xu Palo Alto Networks
Xinran Wang Palo Alto Networks
Huagang Xie Palo Alto Networks
Yanxin Zhang Palo Alto Networks

  download slides (PDF)

PDF has become a popular vector for malware distribution as well as other malicious activities. Given the prevalence of malicious PDF documents in the wild, existing approaches for detecting malicious PDF documents or malicious content within a document are limited by their run-time performance and scalability. To address this issue, we propose a fast and precise malicious PDF filter.

Based on our analysis of the characteristics of malicious PDF documents, we extract a set of novel and predictive features, such as malformed cross-reference and suspicious filter pipeline. To the best of our knowledge, over a dozen of the proposed features have not been seen in previous work.

We also propose a systematic classification of features to cover various aspects (i.e. document structure, embedded code and PDF functionality) of a malicious PDF document. To better leverage these features using machine learning techniques, we studied the trade-offs between performance and accuracy on different machine learning models and chose a linear model for the filter. In the implementation, we tuned the system based on the predictivity of different features, the strength of different models and the feedback from the training phase to maintain a high accuracy. This tuning process can also adjust the system to serve various practical purposes, e.g. a pre-filter in a multi-level detection system, standalone intrusion detection module. Our evaluation on over 25,000 labelled PDF documents and over 150,000 real-world PDF documents demonstrates both a low false positive rate and high detection accuracy. Moreover, it also shows that the performance of the filter is suitable for online scenarios such as residing in a firewall.


We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.