Motivation
Dead code is everywhere. I guess that most sizable software that underwent at least one major rewrite or is taken care of by agile teams which do minor rewrites every week or so, caries with it at least one dead class; not to mention the one or other dead method. When dealing with legacy code especially - consisting of many modules, older than the programmer with most seniority, with little to no tests - the share of the code base being dead can easily grow up to a double-digit percentage. So, when dealing with such code, you may end up adapting classes when you needn't bother, which you shouldn't try to understand - but just delete them and grin.
And in times of continuous integration, this causes unnecessary energy consumption and time spent in build queues that could be empty, waiting for other stuff to do than compiling and testing code that served its purpose a long time ago.
So, code analysis is a great thing. If your home is the Java world, you should know (and sometimes even make use of) such great analysis tools like FindBugs, PMD, or Checkstyle - to name the probably most prominent. This should at least help you get rid of some dead variables and methods. But those tools are usually only capable of validating local usage, restraining them to find unused private methods. And if you're dealing with code from a time where using the private modifier wasn't en vogue - poor you. And talk about whole classes? Just forget about it.
I don't know of any reliable, automatable tool helping to identify dead classes. Sure, IDEs tend to claim that certain classes are not used. But if you don't load all modules - that information may or may not be correct. And legacy code with nearly hundreds of modules? Try putting that in your IDE...
The quest for the dead
So, guess what: I actually had (and sometimes still have) to deal with such supposedly dead code. One approach I took to identify such code was setting up a build in our TeamCity instance executing IntelliJ's inspections. Well, there seems to be a limit for everything. I managed to get one finished build out of hundreds of executions. After upgrading to TeamCity 7, builds were finally running. It still took hours to perform the simplest checks (and a bunch of others I just couldn't turn off, as this is a not-that-well-documented feature) and the report came up with many false positives, even such you wouldn't expect: classes being referenced in several Spring XML files, which are perfectly recognized in IntelliJ Idea. Again: there seems to be a limit for everything.
I'm a peaceful mind, but sometimes you need to use brute force. So I wrote a bash script that looped through all classes, removed one, executed a remote run against our continuous integration system, noted the result, reverted the change and went on. Every other day I looked at the results, did a manual double-check (I failed only once), actually removed the dead code, and started the script again.
Even though I did this for only one of our 50+ modules - the one that was set up years ago to contain all the "business logic", having accumulated 2.8k classes over the years - it took me (or, mainly, the script) months to finally run dry (there were whole chains of dead code and the script only recognized one class after the other). Me & my script managed to eliminate more than 15% of the code base. And I moved another 10% of the code to modules that really needed that classes. Man, I was pumped. And I thought: you're not the only one dealing with legacy code!
But when I decided to open source that thing, I realized that, over the course of months giving the script as little care as a full hour every other week, I could not. Apart from the fact that I don't know bash at all, the script was highly integrated to our build setup and required TeamCity to be executed. As my parental leave was at hand, I left my decision behind.
The missing piece
During that time, some of my co-workers took another approach during the Scout IT Day (our flavor of the FedEx or ShipIt Day) and created a Maven Plugin that chiefly used bytecode analysis to find out if a class was still in usage or not. And that's the starting point for deadcode4j.
deadcode4j is a Maven plugin that aims at finding dead code. It does so using bytecode analysis to find dependencies between code and additionally considers Spring XML files to determine if a class is still in use or not. And this is just the beginning: considering the web.xml, .tld files, @nnotations, and more is in the works.
And how can you use it? Simple: go to the console, navigate to your Maven project, type
mvn de.is24.mavenplugins:deadcode4j-maven-plugin:find -Dmaven.test.skip=true
and see what it finds.
Hop on over to GitHub to learn more, browse the code, or contribute.
Watch out for announcements, new releases and ideas at sebastiankirsch.blogspot.com
No comments:
Post a Comment