Finding duplicate code with CCFinder

Posted on August 5th, 2009 by David Luhman and tagged , .

If you're attempting to refactor a large code base, give CCFinder a try.

I checked out the following tools suggested by the Wikipedia entry on duplicate code :
- Simian
- CCFinder
- PMD Target

Simian was pretty easy to download, install and run. It seemed to do a pretty good job of listing duplicate code sections in the code. But it only gave a text list. I guess I would have saved the output and sliced and diced it a bit in Excel or something, but that would still leave much to be desired.

After checking out the screen shots of CCFinder, I decided to give it a try. It's free, and do I appreciate that, but the registration process was really pretty cumbersome. Its CAPTCHA function is by far the worst I've seen -- I failed it several times. There's also a password for the unzip file, and a license you have to install.

It's also unclear if you need SilverLight 2.x+, Python, .Net etc. I think the installer's been improving, but the whole experience needs some clarity.

I had some problems with my Java path (admittedly my problem), but I finally got the thing installed. I think it was worth the effort.

I was analyzing a PHP code base. Unfortunately, PHP wasn't explicitly supported. So I had to rename the files to .cpp (close enough), and remove the "preprocess" option from the analyzer. Then I directed it to my parent directory, and let it go.

You get a nice visual showing where blocks of duplicate code are. This page gives a good overview of how to navigate and discover things :
http://www.ccfinder.net/doc/10.2/en/tutorial-gemx.html

I found it most useful to look at the sorted Clone-set Table (biggest clones highest). The source code on the right was helpful, but I often went to a diff tool (ex. WinMerge) for further analysis.

I guess it's obvious, but CCFinder also found intra-file clones -- something you can't find with something like WinMerge.

Anyway, one thumb down for CCFinder's install experience, but two thumbs up for an otherwise nice, free tool.