If you're attempting to refactor a large code base, give CCFinder a try.
I checked out the following tools suggested by the Wikipedia entry on duplicate code :
- PMD Target
Simian was pretty easy to download, install and run. It seemed to do a pretty good job of listing duplicate code sections in the code. But it only gave a text list. I guess I would have saved the output and sliced and diced it a bit in Excel or something, but that would still leave much to be desired.
After checking out the screen shots of CCFinder, I decided to give it a try. It's free, and *do* I appreciate that, but the registration process was really pretty cumbersome. Its CAPTCHA function is by far the worst I've seen -- I failed it several times. There's also a password for the unzip file, and a license you have to install.
It's also unclear if you need SilverLight 2.x+, Python, .Net etc. I think the installer's been improving, but the whole experience needs some clarity.
I had some problems with my Java path (admittedly my problem), but I finally got the thing installed. I think it was worth the effort.
I was analyzing a PHP code base. Unfortunately, PHP wasn't explicitly supported. So I had to rename the files to .cpp (close enough), and remove the "preprocess" option from the analyzer. Then I directed it to my parent directory, and let it go.
You get a nice visual showing where blocks of duplicate code are. This page gives a good overview of how to navigate and discover things :
I found it most useful to look at the sorted Clone-set Table (biggest clones highest). The source code on the right was helpful, but I often went to a diff tool (ex. WinMerge) for further analysis.
I guess it's obvious, but CCFinder also found intra-file clones -- something you can't find with something like WinMerge.
Anyway, one thumb down for CCFinder's install experience, but two thumbs up for an otherwise nice, free tool.