Automatic Classifying of Mac OS X Samples

Friday 7 October 14:30 - 15:00, Small talks

Spencer Hsieh (Trend Micro)
Pin Wu (Trend Micro)
Haoping Liu (Trend Micro)

(This VB2016 reserve paper will be presented Friday 7 October at 14:30 in the Small Talks room unless otherwise required on the main programme)

With the rapidly increasing volume of malware, the security industry has been struggling to improve automatic malware classification for many years. Many recent market research reports have suggested that the growth of Apple's Mac OS X has outpaced that of the PC for several years. This shifting trend is attracting more malware authors to develop malware for Mac OS X. In this paper, we present a study of classifying Mac OS X malware with a set of features extracted from Mach-O metadata and its derivatives on our sample collection of 2015-16 from VirusTotal.

Like the PE format for Windows, Mach-O format provides a variety of features for classification. We collected more than 300,000 Mach-O samples submitted to VirusTotal during 2015-16, and filtered out irrelevant samples, such as samples for iOS and samples only for PowerPC. We then generated the metadata of Mach-O files using tools like nm, otool, or strings. Meta information from sample files, such as segment and section structures, imported functions of dynamic libraries, printable strings, etc., are used as features for classifying Mac OS X samples. Additionally, we include derivative numerical features created from meta information, which are introduced in learning-based malware classification widely in recent research studies, e.g. function call distribution, structure complexity, etc.

This study summarizes the statistical changes in view of Mac OS X malware families, and the structure trending between benign and malicious samples between 2015 and 2016. With our collection of more than 300,000 files and over 1,000 malicious samples, our feature evaluation is based on composition analysis of different malware families in both aspects of meta and derivative features. This work uses a variety of classification algorithms to generate predictive models with the 2015 dataset, and to analyse the testing results with 2016 samples and their difference from AV vendors' detections on VirusTotal. We also discuss the effectiveness of selected features, by ranking their importance levels in a predictive model among our classification tests with 2015-16 dataset.

Click here for more details about the conference.


Spencer Hsieh

Spencer Hsieh is a threat researcher at Trend Micro. He joined the company's Threat Solution Research team in 2009. His areas of expertise include cyber threats, incident response, investigation of targeted attacks, malware analysis and exploitation techniques. His current research focuses on areas of emerging threats and targeted attacks.


Pin Wu

Pin Wu is a machine learning engineer at Trend Micro. He joined Trend Micro in 2012, and his research concentrates on machine learning and pattern recognition. His current work focuses on learning-based systems, and data analytics in security applications.



Haoping Liu

Haoping Liu started working in the Internet security industry in 2010, focusing on big data processing and analysis. Currently his research interests include malware classification and social media account reputation.