I wanted to detect duplicate files and to remedy the situation. There are a lot of reasons one might get duplicate files, and I was subject to a goodly number of them, especially multiple separate backups, or multiple copies modified on different machines...

Detecting duplicate files is relatively simple, a few lines of shell script can do it. Doing useful things securely afterwards is much more difficult. Optimizing a rerun on a big set of files that has only changed a bit needs a program and some sort of database.

I thought I couldn't be the first to want that. So I looked. And looked. And found lots and lots of tools. And nothing I found was to my liking.

I wanted a minimal tool that I could install and run on a distant server. I wanted a tool that I could run on just about any Unix-based OS. I wanted a tool that would automatically take care of the duplicates it would find using a simple set of rules that I would give it. I wanted a tool that would not choke and run out of memory when I set it to work on a terabyte volume with multiple copies of cp -al / rsync backups. I wanted a tool that would remember the results of previous runs, to avoid unnecessary work.

I found Duplicate File Finder (the site seems to be down but a description is here). It needs Java and X11. I found Duplicate Files Searcher. It needs Java, probably doesn't have a CLI, and doesn't seem to be FOSS. I found fdupes, but actions are not automatic, are not secured enough, and there is no persistent database. I found the classic and still actively maintained fslint, which has a lot more options that I had thought of, but still no automation that I could see.

Since I didn't find what I wanted, I wrote it.

It's not perfect. I could add features, and I probably will (optimizing checksumming would help a lot when you have lots of big files that change frequently, which is not my case, but easy to do, and testing fox added-to mboxes would be a feature I haven't found elsewhere). There may be bugs (I have yet to set it to work on the terabyte volume I was talking about, let alone benchmarked it).

But I like it, and maybe you will too.

The man page is here and thanks to SourceForge all the rest is here.

Do take care, though. This is alpha-quality software, not because I wrote it sloppily, but because basically no one has really tested it. I especially have no idea about how it might act on non-local filesystem mounts. It's a program to delete files, so take care and either make a backup first or read the code and step through all the debugging options to make really sure it's doing what you expect it to do.