Code-checking tools, and the pain of updating them.

19 02 2007

I currently use FindBugs, PMD and Checkstyle to check all of my code, and can highly recommend these tools. They have evolved to an impressive level of sophistication and reliability over the last year or so. Although there’s quite a lot of overlap between them, and each of them adds some time into my builds, there are enough differences between them that I find it useful to put all of my code through all three of these tools. Between them they provide a useful and varied set of checks, even allowing for the various rules in each tool that I ignore or customize in order to best match my own particular requirements.

I use these tools both when editing the code (by means of the relevant IDE plug-ins) and in my Ant builds. Overall I think they do a great job of catching lots of mistakes at the earliest possible moment (effectively at “compile time”), and have a very positive impact on the quality of my code. It’s hard to imagine life without them any more – it would be like not having a compiler to check the code.

Since adopting these tools, I’ve also noticed an effect on my reaction to other code that I read. I’ve always been rather fussy and fastiduous when looking at code, but now that I take these tools for granted I tend to feel even more incredulous when I see obvious mistakes that these tools would spot automatically (like unused variables, assignments that have no effect etc). Clearly, there’s a huge mass of existing code where nobody is going to use these tools and then immediately fix all of the reported problems. But it does tend to make all those little mistakes and inconsistencies look even sillier when you know that tools such as these would spot them immediately. Whilst most people’s code will just reflect their own particular level of skill or care at the time, one would kind of expect that by now the JDK and other “mainstream” products would be of the highest quality and not still riddled with the sort of problems that these tools can detect. Or at least that such projects would adopt one or more tools of this kind and gradually fix the problems. Unfortunately we still seem to be a long way from this. I applaud Bill Pugh and his FindBugs project for providing a demo where they show the results of running FindBugs on various JDKs and major open-source projects – it’s kind of scary that this finds something like 3,000 problems in the Sun JDK 1.6.0-b99 source code, but great to see someone getting this out into the open.

The one real problem I’ve had with these tools is the amount of time I spend keeping them up to date. Updating the tools themselves is normally simple enough (though as with any such software there are the usual complications, such as changing dependencies on third-party libraries, IDE plug-ins that prove unreliable in the face of changing IDE versions, changing DTDs/Schemas for rules files etc). The real problem is in maintaining the sets of rules to be used. These are typically specified or included/excluded by means of text files, so that you can select which rules you do and do not want applied to your code, customize relevant properties for each rule, or in some cases actually define the individual rules themselves (for example, for PMD you can specify rules by means of XPATH expressions). But each new version of each tool brings a slew of:

  • New rules that you may or may not want to adopt (and that may or may not be working properly yet).
  • Changes to existing rules that may or may not affect the configuration of that rule in your rules files.
  • Fixes to bugs that may or may not allow you to adopt rules that you’d previously ignored because they didn’t work correctly.
  • Fixes to bugs where your code has previously been written in some particular way so as to avoid false-positives due to the bug (though rare, I’ve found that sometimes the best way to cope with a rule that generally works and is desirable but that doesn’t quite cope with some particular situation is to adjust the offending piece of code so that the rule is happy with it – and this then needs to be reviewed if the rule is fixed or enhanced in subsequent releases).
  • Newly-identified problems in your code due to new rules or other changes (which in some cases may themselves lead you to modify your rules files accordingly).

Each such update thus typically involves “merging” any relevant changes into your own rules files; reviewing your use of the rules to take into account all changes and bug fixes since the last version (which in turn depends on you keeping track of which rules you aren’t currently using and why); running the tool and updated rules files against your code; resolving any new problems that result; and reviewing and possibly removing any “work-arounds” in your code that are no longer necessary. This tends to be somewhat iterative, as you can’t entirely assess any changes to the rules until you’ve run them against all of your code, investigated any problems they report, and tried fixing the problems – at the end of which you might decide, for example, that a new rule is not yet working accurately enough to be adopted.

So it can be a rather non-trivial task to work through all of this whilst keeping track of things that might need to be reviewed in future and getting to the point where you’re happy with your rules and all of your code builds cleanly. I’m currently finding that when there is a new release of any one of these tools, I generally need to plan for the update to take anywhere from half a day to a full day’s effort. Obviously the actual effort involved varies for each tool, as well as from release to release. In general it’s easiest when the tool simply provides a set of fixed rules and you are basically just selecting which of them you want to use, and a lot harder when your own rules files have to contain the actual definitions of the rules. So I generally find that FindBugs is the easiest to keep up to date, PMD the hardest, and Checkstyle falls somewhere between the two.

As with any software, there’s also a potential trade-off in how frequently you update each tool. If you keep up-to-date with every new release, you have to go through the update process most frequently but the individual updates are relatively small and it’s always clear what has changed. If you skip versions and update the tool only occasionally (e.g. before starting a major new project), you have fewer updates to do but each one brings a much larger set of changes and it’s sometimes hard to work out what the combined effect of the changes are. In the extreme, if you stick with one release for long enough it can become easiest to just drop all your existing rules files and documentation and start again from scratch. Personally I tend to try and keep as up-to-date as possible, partly because it keeps the changes reasonably manageable, but mainly because each new release tends to bring me useful new rules and bug fixes, and tends to reduce the number of work-arounds and rules I can’t use because of bugs – and it almost always shows me something in my code that can be improved.

The other major headache with updating these tools is when there are major changes to the Java language itself, as for Java SE 5. It can take a while for the tools to catch up with such changes, and to then work correctly with all of the changed language features under all conditions. By that time you’re probably also catching up on a lengthy list of changes and bug-fixes that have occurred in the meantime. This issue rather delayed my move to Java SE 5, and even after starting to use Java SE 5 I still ended up waiting several months before all of these tools could be used again, at which point it took me several days of work to get everything fully up-to-date. Hopefully this won’t be repeated too often, and in theory it should be getting easier for such tools to track language and compiler changes, so this might not be such a big deal next time.

On the whole I would highly recommend anyone that doesn’t yet use these tools to take a look at them and adopt at least one of them. FindBugs in particular provides a good set of very useful checks and requires minimal set-up and customization. Just be aware that unless you can use the tool exactly as it comes, with no customization or selection of rules, then there will be some degree of maintainence effort involved if you want to keep your use of the tool up-to-date. For me, the benefits gained from using these tools is well worth any overhead involved in keeping them up to date.




2 responses

26 02 2007

try maven instead of ant, much easier to use and configure tools like checkstyle

26 02 2007

I do intend to take a fresh look at maven at some point (didn’t win me over the last time I looked at it, but that was a while ago). But I’m not sure how much maven would help. The actual use of these tools in my Ant scripts is pretty simple and rarely needs any changes. It’s not so much the mechanics of updating the tools themselves, more the issue of examining what has/hasn’t changed in each release and deciding what changes to make to rules and/or code as a result (plus keeping track of why not using particular rules, work-arounds that might be removable in future etc). Also, to some extent, keeping the IDE plug-ins updated, as they tend to be somewhat temparamental. Or perhaps you could elaborate on how maven could help?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: