Malware deobfuscation: symbolic analysis to the rescue!

Friday 6 October 11:00 - 11:30, Red room

Sébastien Bardin (CEA LIST)
Robin David (CEA LIST)
Jean-Yves Marion (LORIA)

Over the years, obfuscation has taken a significant place in the software protection field. The term generally embraces any means aimed at slowing down the analysis of a program, either by an analyst or an automated algorithm. As such, it has gained a certain popularity in the video-game industry. Unfortunately, it has also gained popularity in the malware underground ecosystem - in order to delay malware comprehension, leading to the need for deobfuscation techniques.

In the broadest sense, deobfuscation is the means to make the behaviour of a piece of malware more intelligible, taken as a fact that recovering the original program is generally impossible. Since the first step towards understanding a binary program is to disassemble it in order to obtain a good representation of its Control-Flow Graph (CFG), obfuscation techniques (also) aim at fooling existing disassembly tools and techniques: while static disassembly covers the whole program but is quickly fooled by obfuscations such as self-modification, dynamic disassembly helps to get a real execution trace of the program but is limited to one or a few execution paths.

Symbolic execution has recently been proposed as an interesting alternative to deobfuscation: more robust than static analysis and more complete than dynamic analysis. Interestingly, the technique amounts to reason over the semantic of the program (which is not modified by obfuscation) rather than its syntax (which is heavily modified). Yet, the technique is still young, and can quickly suffer from scalability issues.

We show in this talk how to combine in a successful way several state-of-the-art variants of symbolic execution together with dynamic analysis and static analysis in order to help recover a more precise CFG of the obfuscated code under analysis. In particular, dynamic analysis brings robustness to tricky obfuscations such as self-modification, variants of symbolic executions can both discover new parts of the obfuscated code and prove that other parts are spurious (e.g. not reachable because of protections such as opaque predicates or stack tampering), and standard static analysis can be guided in a safe way to extend the disassembly. We will explain in detail the method and how it is implemented in the open-source platform BINSEC, and describe successful case studies on state-of-the-art packers and the government-grade X-Tunnel malware - allowing its entire deobfuscation.

The end goal is to empower the reverse-engineering by giving the analyst semantic information about the program such as obfuscation in order to hold all the cards in hand for a better and deeper understanding of the binary being analysed.

Symbolic deobfuscation methods are a recent and hot topic, holding great promise but not easy to apply in practice. This talk aims to demystify these techniques, presenting them in a clear manner - from basic concepts to the latest cutting-edge implementations - and giving insights into their strengths and weaknesses, including potential counter-measures (together with possible mitigations). This line of work is also in the continuation of other efforts to bring more semantic analysis into the anti-malware arsenal.




Sébastien Bardin

Sébastien Bardin obtained his Ph.D. in 2005 at ENS Cachan, France, in the field of formal methods. He joined CEA LIST, France, in 2006 as a full-time researcher, with research activities centered on program analysis and automatic software verification. For a few years now, Sébastien has been interested in automating software-level security analysis by lifting formal methods developed for the safety-critical industry. In particular, he focuses on binary-level formal methods, vulnerability detection & assessment, and malware deobfuscation. He leads the 'binary-level security' group at CEA LIST as well as several related research projects, and he is one of the main designers of the (open-source) BINSEC platform for binary-level code analysis.



Robin David

Robin David is a software security engineer at Quarkslab in Paris, working on various reverse-engineering and R&D subjects. He previously performed a Ph.D. at CEA LIST (Atomic Energy Commission, France) working in collaboration with the High Security Lab of LORIA (CNRS, Inria and Lorraine University) under the supervision of Jean-Yves Marion. His Ph.D. was on the application of formal methods for security purposes and in particular for malware deobfuscation. As such he implemented the core dynamic symbolic engine in BINSEC along with the instrumenter (Pinsec) and an associated IDA plug-in (IDASec).



Jean-Yves Marion

Jean-Yves Marion is professor at Université de Lorraine, France. He is the director of the computer science lab LORIA ( which is affiliated to CNRS, Inria and Université de Lorraine. He got his habilitation in 2002 from Université Nancy 2 and his Ph.D. from Université Paris 7. He spent two years at Indiana University. His research interests is computer security and more specifically malware. He is one of the co-founders of the High Security Lab (LHS) which was in 2010 a unique research platform to conduct computer security experiments. He has published about 80 papers and supervised about 25 Ph.D. theses. Since 2015, he has been a member of Institut Universitaire de France.


   Download slides







Other VB2017 papers

Mariachis and jackpotting: ATM malware from Latin America

Thiago Marques (Kaspersky Lab)

Fabio Assolini (Kaspersky Lab)

Of all the forms of attack against financial institutions in the world, the ones that are most likely to combine traditional…

XAgent: APT28 cyber espionage on macOS

Tiberius Axinte (Bitdefender)

This paper provides an in-depth analysis of the macOS version of the APT28 component known as XAgent. We will dissect the…

Keynote address: Inside Cloudbleed

John Graham-Cumming (Cloudflare)

In February 2017, Cloudflare was revealed to have been leaking private information including HTTP headers, cookies and POST data…