An attempt to find a suitable Java “Obfuscator”

27 08 2007

The Plenty of code blog recently published a list of free/open-source Java decompilers and obfuscators, and there was an interesting discussion about this on the JavaPosse #139 podcast.

This all happened to coincide with my own search for an obfuscator, so it seems worth describing what I found.

My reason for looking for an obfuscator was to add some degree of protection to some of the licencing code in my ObMimic product. I’m not expecting to make this “unbreakable” – realistically there’s very little one can do to make such code un-hackable. Also, being a tool for Java developers, I’d expect ObMimic’s users to be rather more trustworthy and respectful of licences than the general population (and legitimate licences are going to be pretty cheap anyway).

It’s also important that the rest of the ObMimic library remains amenable to IDE code-completion, variable inspection in debuggers etc.

So all I’m looking for is some way to make a handful of specific classes and methods at least slightly non-trivial to decompile, adjust and replace. For such limited results, the effort and cost involved needs to be low, and there needs to be minimal risk of introducing any bugs or other problems. Also, any such tool will run as part of an Ant build script, so Ant integration is relevant but any GUI front-end isn’t.

Given the tendency for everything to be open-source these days, one wouldn’t expect obfuscators to be a hot topic. But even taking this into account a quick look at various Java obfuscators was rather disappointing:

  • Many of these tools haven’t been updated in a long time, don’t cater for Java SE 5 or higher, and look like “abandonware”.
  • Even some of those that are still “current” have a rather ancient feel (e.g. “COM.*” package names, no Ant task, use of CLASSPATH etc).
  • For most of the tools, “obfuscation” is almost synonymous with renaming, which is seen as being for reducing the size of the code at least as much as for obfuscation. It might be more accurate to describe these as “renaming” tools that claim size reduction and obfuscation as possible benefits (with varying degrees of emphasis).
  • They all seem to be based on wanting to obfuscate everything and having to identify or specify which things should be left un-obfuscated. Any explicit configuration tends to be in terms of what should be excluded from the obfuscation, sometimes separately for different types of obfuscation. This can make it difficult or impossible to specify that you only want to apply particular obfuscations to particular elements.
  • It’s often surprisingly hard to check exactly what changes these tools have made – their logs tend to show either too little or too much (e.g a handful of changes buried in lists of everything that has not been changed).
  • For the commercial products, support is generally by e-mail, with no obvious sign of any public forums or bug lists.

The overall impression is that most of these tools are rather long in the tooth, and that many of them are focused primarily on “renaming” just on the basis that it can be done, rather than addressing the requirements for protecting code from reverse-engineering.

For my own purposes, “renaming” isn’t much use. I’m just looking to protect selected methods whilst leaving most of the code amenable to examination and debugging within IDEs, and renaming isn’t going to achieve much if it only affects a tiny number of methods and local variables. So instead I’m looking for “control-flow” obfuscation and any other anti-decompiler measures that can be applied to selected classes and methods.

After ruling out the “abandonware”, products without “control-flow” obfuscation, and products that are otherwise clearly unsuitable, I ended up with the following shortlist (all commercial products, by the way):

Of these, Dash-O might be very good – the web-site and documentation look as good or better than any of the others, it seems to have a full set of features, and they claim it’s used by Sun for JCE and JSSE code. However, it’s more expensive than I’m willing to pay at the moment given my limited needs and limited funds ($1,890 for a single named developer, or $4,950 for one “build machine” and up to five named developers), whilst not offering me anything especially unique. So I haven’t looked at it in any detail.

SmokeScreen looked quite promising, but proved rather tricky to configure as required. More fundamentally, it crashed on one of my classes (with a “Non-array descriptor for array reference” message, on code that compiles and runs successfully, passes all my FindBugs/PMD/Checkstyle checks, and didn’t cause any problems for the other tools).

Allatori looked good in terms of overall quality and usability, but its configuration facilities didn’t seem to provide any way to restrict it to just “control-flow” obfuscation on just the relevant methods. Also, when applied to ObMimic’s licencing code its “control-flow” obfuscation had only minimal effect. It did give better results on other methods, so at a pinch this could possibly be improved by adjusting the code, but that still wouldn’t solve the configuration issues.

Zelix KlassMaster nearly didn’t get looked at, because their web-site rather put me off – the home page looks amateurish and loads a fairly pointless video.

When I did try it, the scripting language through which it can be controlled looked very flexible but proved rather tricky to configure correctly. In the end I got it to do what I needed by:

  • Defining my own @Obfuscatable annotation, using this to mark the methods I want to have obfuscated, and then configuring KlassMaster to exclude all methods that don’t have this annotation. This led to a much simpler and more maintainable KlassMaster configuration, and looks like a useful general technique regardless of the particular obfuscation tool used.
  • Using the “verbose” option for KlassMaster’s log file so that it explains the reasons for excluding individual methods. This pointed out a particular call from a static-initializer as its reason for not obfuscating one of the methods. A minor change to the relevant code solved this.

Having sorted this out, the resulting obfuscation looks ok – viewing the class files in a decompiler shows some pretty garbled code including untranslated byte-code. It’s a bit hard to make a final judgement, as the evaluation version of KlassMaster limits its “code-flow” obfuscation to no more than one or two methods per class, but it seems to be doing what I need from it.

So for the time being, my plan is to go with Zelix KlassMaster combined with my own @Obfuscatable annotation to indicate the relevant methods.

I do like the idea of having an annotation to control and document which methods should be obfuscated, so will probably keep this even if I ultimately end up switching to a different obfuscator.

More generally, this is clearly a small and unfashionable niche within the overall Java ecology, but you’d think there would be scope for someone to come in with a truly modern tool for protecting Java class files from reverse-engineering. My wish list would be:

  • Focused on code-flow obfuscation and other “anti-decompiler” techniques rather than renaming.
  • Taking advantage of the latest JVM enhancements where possible. For example, perhaps the addition of “invokedynamic” to the JVM will provide new ways to adjust byte-code so that it can’t be decompiled to Java source code?
  • Ability to specify positive configuration of what technniques to apply to which elements, rather than “negative” exclusions.
  • Logs that clearly show what changes the tool has made.
  • Cheaper than Dash-O.
  • With a modern web-site and public discussion forums.

It’s hard to tell if there’s enough of a market for anyone to make money out of this, but a good newcomer ought to be able to completely outclass the existing products, maybe even redefine the whole issue.

Advertisements

Actions

Information

10 responses

27 08 2007
Rob Abbe

I didn’t see you mention proguard. It works pretty well, though I have seen it hang and crash in certain cases. I think it gets down to the level you need.

http://proguard.sourceforge.net/

28 08 2007
closingbraces

Rob,

Yeah, did try ProGuard. It looked ok for “name” obfuscation. I like its Ant task, with the ProGuard configuration embedded inside the Ant task and able to use normal Ant properties (Allatori was the only other one whose configuration file could directly use properties from the Ant script). I also see it as a real plus that there are public, active discussion forums and bug reporting.

But the configuration options didn’t seem to quite support what I needed (at least, not without a lot more digging around and experimenting than I had time for). ProGuard 4.0, currently in beta, claims to provide more “orthogonal and flexible” options together with the use of annotations, so that might do the trick.

However, the main problem from my point of view was the lack of “control-flow” obfuscation (or similar), which turned out to be what I really needed. If I was just obfuscating an entire application and “name” obfuscation would do the trick, I might well have chosen ProGuard as being good enough, generally neat and sensible, and free.

For info, I also tried yguard (http://www.yworks.com/en/products_yguard_about.htm) and RetroGuard (http://www.retrologic.com), with similar limitations (plus other problems of their own). Also looked at about half-a-dozen others that turned out to just be dead/obsolete.

Mike

28 08 2007
Hans-Eric Grönlund

Using annotations to configure an obfuscator sounds like a beautiful solution. Please consider writing a post on that topic. I’d read it with great interest.

28 08 2007
david

Here are some fallacies about obfuscating code:
1. I’ll make money selling it – you’ll only sell it to those people who are interested in doing business with you; anyone interested in stealing it won’t buy it in the first place.
2. It’s my revolutionary intellectual property – hiding your fabulous solution only hides you too; it’s best to fully tag/identify yourself and how to reach you within the code and if the code is truly revolutionary then people will come to you seeking out even more.
3. For security reasons – the data is the meat of any program and it should bear the weight of encryption, not necessarily the program code; people want to break the ‘unbreakable’, solve the unsolvable, and so open code is less of a ‘draw’ to would-be theives.
5. Other obfuscation issues:
5.a. Maintenance/Support – ever tried to work with a customer who is having a problem with obfuscated code when the originator of that code is not available?
5.b. Modification – ever tried to work with a customer who has obfuscated code and needs a modification to that code (or to work with that code, slightly modified) and the originator of that code is not available?
5.c. Human failings – ever lost the original un-obfuscated code for some years-ago project and need it only to find the sole copy is the obfuscated one you or your customers have?

Honestly, does obfuscation provide any benefit? For the vast vast majority of enterprises, the promise of reusable and ‘off the shelf’ modules of code remains a pipe dream; companies, more always than not, want to have their own implementations and psychologically set themselves up to justify it by saying ‘our situation, our usage, is different from the existing by [insert reason]’. They won’t go out of their way to find/buy/steal any external code whether obfuscated or open source.
Leaving modules/methods/function unobfuscated and only protecting entire program suites when necessary (and yes I do agree that some data, some manipulations performed on sensitive personal data, should be out of ‘harms way’) is the way to go.

28 08 2007
closingbraces

Hans-Eric,

Thanks for the interest. The annotation itself is pretty trivial because it’s just a “marker” that can then be referred to by the obfuscator’s configuration file. But when I get a chance I’ll do a post showing it and explaining its use, alternatives, issues etc.

Mike

28 08 2007
closingbraces

David,

I agree with some of that, especially no. 1, but think it all has to be taken in context.

My own context is that:
– The product involved will have a zero-cost edition;
– People can decide for themselves whether or not to buy the “licenced” edition for its extra features as they see fit (it’s up to me to make that worthwhile…);
– None of the actual functionality will be obfuscated, only some very specific pieces of the licence-checking code;
– The obfuscation won’t be widespread “renaming”, just internal changes within the byte-code of specific methods to make them slightly non-trivial to decompile/replace;
– As stated in the post, I don’t see this as worth lots of time, effort, or cost, or as somehow foolproof – in keeping with point 1 of your comment I largely trust my likely customers, and just want a minor obstacle to effortless hacking by all and sundry.

More importantly, for anyone that wants the source code so that they can modify it, or to guarantee they can maintain/support it themselves if necessary, or if they want the unobfuscated code for any other reason, this will also be available – there will be a site-licenced edition that comes with full source code including test cases (with 100% code coverage) and Ant build script.

With regard to your general comment, this particular product is a tool to help with testing. There’s no issue of claiming this is some kind of reusable business component that’s somehow better than a company’s own way of doing things, and no issue over “data” security. I think you might have been imagining an entirely different category of product – probably my fault for not spelling this out.

Obviously, your mileage may vary – in many situations obfuscation may be a very bad idea, for the reasons you’ve explained. In other situations it may be entirely reasonable or even absolutely vital.

As for 5.c, anyone that suffers this has problems far more fundamental than obfuscation, and have only themselves to blame (source control? backup? disaster recovery? due diligence?). I don’t think relying on copying the code back from your customers is the answer, with or without obfuscation!

Mike

29 08 2007
callingshotgun

David-

While I agree with several of your points about obfuscation, it DOES have benefits. I spent some time as a J2ME developer for a local startup, and there was a hard limit on what size JAR file different phones would accept. Proguard was an absolute dream in this scenario, shaving off 30%-40% of the JAR size- it was what let us get our applications onto the phone. A 128KB ceiling is a tricky obstacle, especially when you’re writing games (where graphics can easily take up a third of the overall jar, if not more). We were able to write readable, maintanable code without sacrificing file size.

29 08 2007
closingbraces

Javalobby also now has a What’s the Best Code Obfuscator? discussion underway.

31 08 2007
Dmitry

Have you considered compiling your application down to native code using GCJ or Excelsior JET?

31 08 2007
closingbraces

Native code isn’t remotely appropriate for this product. It’s a library for Java programmers to use in their own testing, and it needs to be as portable and cross-platform as possible, and (as per the article) I want most of the library to be readily examinable in Java debuggers. That is, when users are debugging their own code that uses the library, it’s useful to be able to see the state of the library’s objects as they proceed.

More generally, the benefits of being cross-platform “pure Java” are important to me. The only thought I’ve given to native code is to immediately rule it out.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: