A Morphological Approach to Binary Code Analysis
Binary code analysis is a complex process which can be performed nowadays only by skilled cybersecurity experts whose workload just keeps increasing. Uses cases include vulnerabilities detection, testing, clustering and classification, malware analysis, etc… We develop a tool named Gorille, which is based on the reconstruction of an high level semantics for the binary code. Control flow graphs provide a fair level of abstraction to deal with the binary codes they represent. After applying some graph rewriting rules to normalize these graphs, our software tackles the subgraph search problem in a way which is both efficient and convenient for that kind of graphs. This technique is described as morphological analysis as it recognizes the whole shape of the malware.
That being said, some pitfalls still need to be considered. First of all, the output can only get as good as the input data. And it is known that static disassembly cannot produce the perfect control flow graph since this problem is undecidable. As a matter of facts, malware heavily use obfuscation techniques such as opaque predicates to hide their payloads and confuse analyses. Dynamic analysis should then be used along with static disassembly to combine their strengths. Another dangerous pitfall feared by every expert is the so-called false positives rate : false alarms that make them waste indeed a precious time assessing the reality of the threat. Shared binary code is not always relevant as many software embed static standard libraries. Gorille’s solution to this issue lies in graph rewriting. By rewriting classic subgraphs into configuration-based special nodes, we even obtain an higher abstraction of the control flow graph.