Taxonomy of web-based malware - finding rules for heuristic detection

Fraser Howard and Vanja Svajcer Sophos

Scripts embedded in HTML pages have been responsible for a large proportion of wide-scale malware attacks. Wide accessibility of vulnerable web applications combined with a significant number of unpatched browser and browser plug-in vulnerabilities has provided fertile ground for delivery of malware. Web-based threats are constantly changing in nature and complexity, with new obfuscation techniques and payload delivery methods appearing daily.

Yet, the large proportion of existing work on classifying and taxonomy of malware concentrates mostly on Windows executables. The intention of this paper is to provide the reader with a useful classification of web-based script threats based on the well defined attributes of the current malicious set of scripts.

The paper discusses the results of various script analysis methods with the objective of classifying web-based script malware and discovering attributes that can safely be used to construct heuristic detection rules, with the emphasis on minimising false positive detections. The suitability of discovered attributes is verified using statistical and machine-learning methods.

The research also documents various threat characteristics including the most commonly used attack methods, delivery methods, delivered payloads, size of distribution networks, obfuscation methods, geographic locality, delivered exploits and shellcode, number of exploits per attack site and number of attacks per compromised site. The presentation will include case studies and a demonstration of a web threat analysis system.