Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Dupe finder+grouping (Read 3159 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Dupe finder+grouping

I have quite a large library from buying a LOT of used CD's over time and therefore also many dupes of different kind. Most dupes I won't really get rid of, but just mark as "unsuitable for shuffle-playback" due to:

* seamless transitions on albums or crossfades on compilations
* different cuts of basically the same song (radio edit/album version/extended etc)
* slightly different masterings, remasters etc
* simply too many dupes (they occur too often)
... you probably get the idea

My biggest headache in this is different naming. I know I could "just" fix the tags, but still that's a lot of music and too many days of work and I haven't even decided with myself if a "standard"-version of a song on a single should be tagged as "radio edit" when the album version is completely identical. So to find all the dupes, the best way to do it would be a tool that uses acoustic fingerprinting. We have that with foo_biometrics  Unfortunately, my experience with it is that it often avoids identical songs also found in the library and I've also experienced large groups of false positives in the same search that returns neatly grouped dupes.

So to the question: Does anyone know of any other similar dupechecker? At least a dupechecker that can correlate a similarity threshold based on tags at least and group them together so I can (load them into foobar2000 and) decide for myself what to do with them?

(Note: Has to support FLAC's)
Can't wait for a HD-AAC encoder :P


Dupe finder+grouping

Reply #2
I am also searching something similar, in my case (a collection only made of CDImage.flac with the offset "fixed" to the highest in the AR database) something that would grab all the already existing flac MD5 checksums of the audio data & compare them would be the perfect tool.

A while ago I had the idea of comparing .accurip with a classic dupe finder (non-audio specific) as I have fixed all the offset to the highest in database, so that I instantly find doublon if I open 2 .accurip manually. The problem is that the .accurip keeps record of the date/hour so it prevents me from being able to compare all my .accurip between each other as the date is always different (the increase in confidency is not really a problem as I can do it all within a month). If I don't find a better way I am wondering if I will not search for a software (a batch .txt editor) to erase the 1 line [Verification date: XX/XX/XXXX XX:XX:XX] of all my .accurip (after I renamed them as .txt), then I could finally compare them "easyly" ... as you see the idea is very tricky when the right tool for the job could do it in one click ...

Even if it's a complete waste of time to destroy all my . accurip to find doublons & then recheck a month later ... I guess that in my case (several teraoctets) it would be faster/easier than doing it manually.

Dupe finder+grouping

Reply #3
Thanks for the suggestion. I was very optimistic about MIP - I used it some time ago as a background service to create nice mixes. However, I couldn't make it find "dupes" until i added a copy of the files on a different location to it, so it doesn't seem to use it's nice fingerprinting technique for that
Can't wait for a HD-AAC encoder :P

Dupe finder+grouping

Reply #4
I just discovered a freeware called Similarity.

As yet I have not tried it myself and I don't know whether it would suit your needs in detail but I thought I should mention it because it looks promising with its configuration options and support for mp3, flac, WavPack and other formats.
This is HA. Not the Jerry Springer Show.