Using statistical analysis of DNS traffic to identify infections of unknown malware

Brandon J. Niemczyk Hewlett-Packard
Jonathan Andersson Hewlett-Packard

The ubiquity of the DNS protocol combined with the fact that it is rarely encrypted can provide a unique view into the activity of a network. Our hypothesis is that the usage of DNS by a malicious program will be identifiably different from cases of legitimate use. Fast-flux and pseudo-random domain generation can provide two immediate examples of where typical DNS usage by malware diverges from legitimate uses.

This presentation will cover the following key points:

Using fuzzy logic to classify second-level domains generated by a pseudo-random algorithm.
Identifying fast-flux domains by observing responses.
Taking requests for fast-flux domains, pseudo-random domains, known blacklisted domains, and known whitelisted domains to build labelled training datasets of both good/bad requests and infected/unknown_status hosts.
Building distributions of behaviour exhibited by infected hosts and hosts without a known infection.
Practical experience in gathering and storing large enough amounts of DNS data to perform the analysis.
Using these results to identify key components that can be used to determine whether a host can be categorized as 'infected'.

VB2013 takes place 2-4 October 2013 in Berlin, Germany.

The full programme for VB2013, including abstracts for each paper, can be viewed here.

Click here for more details about the conference.