Bindex 2.0

Tim Ebringer Microsoft

We present the algorithms, applications and new experiments based on the next generation of Bindex, our in-house binary search engine. It enables binary queries on as little as four bytes, across terabytes of data. The latest version successfully scaled up to a much bigger deployment, meeting or exceeding all of our performance goals. At present, we index memory dumps of malware processes (to bypass obfuscation and packers), as well a clean file set.

Bindex is used to find related samples, name samples and avoid false positives. Its greatest feature is that it provides instant feedback for malware researchers, who can perform several speculative queries in the time it takes to rebuild the signatures. It is now ingrained into our research workflow, and we present several examples of unusual and successful queries, such as a binary query against the bytes in the embedded GIF file used by a rogue.

Early Bindex results were presented at CARO 2010, but since then, the algorithms and data structures have changed significantly to address scalability. We will present the new algorithms behind Bindex 2.0 as well as the workflows our research team has adopted over the first production year of its life.

Finally, we will present a new, derived application, which can visually provide a 'heat map' in IDA, of the 'rareness' of bytes. For library code, which has typically been indexed many times, we can provide a visual cue that this code is common, and not suitable for a signature.


We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.