VB2017 preview: Beyond lexical and PDNS (guest blog)

Posted by    on   Oct 5, 2017

In this special guest blog post, VB2017 Silver sponsor Cisco Umbrella writes about a paper that researchers Dhia Mahjoub and David Rodriguez will present at the conference this Friday.


In the past decade, detection of DGA (Domain Generation Algorithm) domains has relied primarily on lexical analysis of domain names, tracking of NX (non-resolving) domains, and malware reversing. The earliest works have been groundbreaking but since then, we have only observed small incremental improvements, combining machine learning techniques with sandbox-based analysis of DGA malware.


Figure 1: Time-dependent user-domain interactions.

In a talk to be given by Cisco Umbrella researchers Dhia Mahjoub and David Rodrigeuz at VB2017 this Friday, we propose to completely reframe the problem and take advantage of a worldwide visibility into user-domain interactions via DNS. We introduce a novel approach to not only represent client-domain interactions as a bipartite graph but also carefully study the evolution of the topological properties of this graph over time, hence the concept of 'time series on graphs'.


Figure 2: A client (a yellow triangle) querying domains (grey circles) at different rates at different times.

We unravel these time-dependent graphs by tracking bots surfing the Internet. What we mean by 'bot', is a client machine infected with malware, controlled by some other machine(s). We then study how these machines query domains on the Internet. Reciprocally, we study domains receiving queries from bots. And here's the breakthrough: the sender of queries and the receiver of queries appear to be symmetrical and loaded with action. 

Stepping back, it becomes clear that a bot is not only defined by the speed at which it queries domains, but also by the diversity of domains, repetition, and popularity of those domains. Similarly, algorithmically generated domains typically deployed in botnets, are not only defined by the speed at which they are queried by clients, but also by the diversity of clients, repetition, and chattiness of those clients. 


Figure 3: Graphical properties serving as building blocks to signals. 

From a machine to domain edge, in the graphs we analyse are values indicating the force with which a machine is attracted to a domain, or a domain is attracted to a client. This interaction is one of millions that occur hourly, creating one very noisy graph. As we observe this graph over time, we see the values fluctuate with differing velocity. The beauty is to see these sender/receiver signals isolate domains used in a broad variety of campaigns: Necurs, Conficker, Suppobox, PykSpa, and more.


Figure 4: Interactions of one node in a graph with another, with edge weights varying over time. 

Using Hadoop technologies, we derive methods for creating and storing these signals computed on graphs, mapping the interactions of tens of millions of user-domains. In our talk, we will explain how we broke this problem down into smaller sub-problems that could be solved with effective MapReduce jobs woven in Oozie workflows, and why we chose not to use Spark and GraphX but to build our own graph and graph metric techniques.

Come to our talk to learn about these new methods to analyse and define a few intuitive and yet effective features on the nodes of any bipartite graph, but with a network security twist:

  1. Chattiness of a user IP (or the number of unique domains this user queried over a period of time)
  2. Popularity of a domain (or the number of unique user IPs that queried this domain over a period of time)
  3. Jaccard similarity for a user IP (or the percentage of similar domains this user IP queries from one hour to the next)
  4. Jaccard similarity for a domain (or the percentage of similar user IPs that queried this domain from one hour to the next)
  5. Spread (or the ratio between the average and median Jaccard similarity for a user IP or a domain).

'Beyond lexical and PDNS: using signals on graphs to uncover online threats at scale' will be presented by Dhia Mahjoub and David Rodriguez at 14:00 on Friday 6 October in the Red room.



dga cisco vb2017 pdns


Latest posts:

VB2017 paper: Nine circles of Cerber

Cerber is one of the major names in the world of ransomware, and last year, Check Point released a decryption service for the malware. Today, we publish a VB2017 paper by Check Point's Stanislav Skuratovich describing how the Cerber decryption tool…

Attack on Fox-IT shows how a DNS hijack can break multiple layers of security

Dutch security firm Fox-IT deserves praise for being open about an attack on its client network. There are some important lessons to be learned about DNS security from its post-mortem.

Throwback Thursday: BGP - from route hijacking to RPKI: how vulnerable is the Internet?

For this week's Throwback Thursday, we look back at the video of a talk Level 3's Mike Benjamin gave at VB2016 in Denver, on BGP and BGP hijacks.

Security Planner gives security advice based on your threat model

Citizen Lab's Security Planner helps you improve your online safety, based on the specific threats you are facing.

VB2017 video: Spora: the saga continues a.k.a. how to ruin your research in a week

Today, we publish the video of the VB2017 presentation by Avast researcher Jakub Kroustek and his former colleague Előd Kironský, now at ESET, who told the story of Spora, one of of the most prominent ransomware families of 2017.