Using statistical analysis of DNS traffic to identify infections of unknown malware
Brandon J. Niemczyk Hewlett-Packard
Jonathan Andersson Hewlett-Packard
The ubiquity of the DNS protocol combined with the fact that it is rarely encrypted can provide a unique view into the activity of a network. Our hypothesis is that the usage of DNS by a malicious program will be identifiably different from cases of legitimate use. Fast-flux and pseudo-random domain generation can provide two immediate examples of where typical DNS usage by malware diverges from legitimate uses.
This presentation will cover the following key points:
- Using fuzzy logic to classify second-level domains generated by a pseudo-random algorithm.
- Identifying fast-flux domains by observing responses.
- Taking requests for fast-flux domains, pseudo-random domains, known blacklisted domains, and known whitelisted domains to build labelled training datasets of both good/bad requests and infected/unknown_status hosts.
- Building distributions of behaviour exhibited by infected hosts and hosts without a known infection.
- Practical experience in gathering and storing large enough amounts of DNS data to perform the analysis.
- Using these results to identify key components that can be used to determine whether a host can be categorized as 'infected'.
VB2013 takes place 2-4 October 2013 in Berlin, Germany.
The full programme for VB2013, including abstracts for each paper, can be viewed here.
Click here for more details about the conference.