Obfuscation

One topic I'm often asked about is obfuscation of managed code. In the context of software, obfuscation is the process of scrambling the symbols, code, and data of a program to prevent reverse engineering.

Optimizing C++ compilers for native code tend to produce obfuscated code by default. In the process of optimizing, the code is often rearranged quite a bit and symbols are stripped from retail builds. In contrast, managed code compilers (C#, VB.NET, etc) generate IL, not native assembly code. This IL tends to be consistently structured and fairly easy to reverse engineer. Most optimization happens when the IL is JIT-compiled into native code, not during compilation.

This means it's pretty easy to take a compiled assembly and de-compile it into source code, using a tool such as Reflector. While this is a non-issue for web scenarios where all the code resides on the server, it's a big issue for some client scenarios, especially ISV applications. These client applications may contain trade secrets or sensitive information in their algorithms, data structures, or data. This is where obfuscation tools come in.

Obfuscation tools mangle symbols and rearrange code blocks to foil decompiling. They also may encrypt strings containing sensitive data. It's important to understand that obfuscators (as they exist today) can't completely protect your intellectual property. Because the code is on the client machine, a really determined hacker with lots of time can study the code and data structures enough to understand what's going on. Obfuscators do provide value in raising the bar, however, defeating most decompiler tools and preventing the casual hacker from stealing your intellectual property. They can make your code as difficult to reverse engineer as optimize native code. 

If you're interested in obfuscation for your code, I recommend taking a look at one of the third-party obfuscators that work on managed code. For example, Visual Studio ships with the community edition of Dotfuscator, a popular obfuscation package. The community edition only mangles symbol names, so it's not doing everything the full-featured editions do, but it will at least give you an idea of how an obfuscator works. And there are other third-party obfuscators that work on managed code as well. Keep in mind that obfuscating your code may make debugging more difficult or impossible. Many of the third-party obfuscators have features that help with debugging, however, such as keeping a mapping file from obfuscated symbol names to original symbol names.

I'm also asked what is Microsoft's stance on obfuscation? Do we obfuscate our own code? The answer for the .NET Framework is no. As a development platform, it makes more sense not to obfuscate, so we protect our intellectual property by other means. Some Microsoft products that use managed code have opted to obfuscate, however, so we do not have a one-size-fits-all approach within the company.

I'd be interested to hear your opinions of or experience with obfuscation. Were you able to protect your code? What problems did you run into?

Comments