Help - Search - Members - Calendar
Full Version: Is there a Tool for MP3 checksum generation?
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - General
hsc
Hi,

I'm looking for a tool which is able to compute checksums (e.g. MD5) for the audio part of mp3 files, so that tag changes do not invalidated the computed checksum. Is there something available for this?

Thanks,
Horst
Synthetic Soul
LAMETag will report both the recorded CRC and the actual CRC for a file encoded by LAME. Here's an example report:

CODE
Music CRC:          DB0C
Actual Music CRC:   DB0C
Info tag CRC:       9A99
Actual InfoTag CRC: 9A99

According to this post sn0wman may be working on an application to suit your needs. Perhaps a little gentle persuasion may help him/her along... wink.gif
Florian
Mp3tag's export feature can export the MD5 hash of the audio part via %_md5audio%.

To verify the hashes, you have to compare the output by yourself (or by a small script).
hsc
QUOTE (Synthetic Soul @ Mar 9 2006, 02:14 AM)
LAMETag will report both the recorded CRC and the actual CRC for a file encoded by LAME.  Here's an example report:

CODE
Music CRC:          DB0C
Actual Music CRC:   DB0C
Info tag CRC:       9A99
Actual InfoTag CRC: 9A99

According to this post sn0wman may be working on an application to suit your needs.  Perhaps a little gentle persuasion may help him/her along... wink.gif
*


Thanks for the fast reply. Unfortunately I can't use the LAME tag feature, because I still have some real old mp3s where this feature is not available (I guess). But the sn0wman pointer seems be very promising. Do you know if there is some active development underway? Also in that topic somewhere a tool named "mp3-vaccinator" was mentioned, which might already what I'm looking for. But I was not able to find this anywhere even with the help of google 8(
hsc
QUOTE (Ganymed @ Mar 9 2006, 02:50 AM)
Mp3tag's export feature can export the MD5 hash of the audio part via %_md5audio%.

To verify the hashes, you have to compare the output by yourself (or by a small script).
*


I'm using mp3tag now for quite some time (great tool), but have never seen that feature. I just tried it and it seems to work quite well. I was just struggling with the "One file per directory" option. How is this supposed to work? What I expected was, that I find find a mp3tag.html file in each subdir next to the mp3 files, but the tool wrote n-times the file in the current working directory, which overwrote all but the last instance (using version 2.34a).
Of course I would need to write some tool, which helps comparing the output, but I think that is not very difficult. But of course it would be a big help if the "one file per dir.." option would indeed put the output file into the respective subdirs.

Thanks,
Horst
Synthetic Soul
I don't know anything more about sn0wman's work than is visible in that thread. Why not PM him?

If MP3Tag can export the MD5 for any file (not just LAME-encoded files, as per LAMETag) then why not load all of your files into MP3Tag, and export a list.

When you want to do a compare reload all the files again and export another list. Then use a text file comparison tool like WinMerge to spot any differences.

Depending on the MP3Tag report it's possible you may need to parse the reports first, to ensure that only the MD5 values may change (e.g.: if the report has the current date/time on each line).

It's an option. How would you like to see this process work?

Edit:

QUOTE (hsc @ Mar 9 2006, 12:21 PM)
I'm using mp3tag now for quite some time (great tool), but have never seen that feature. I just tried it and it seems to work quite well.
Can I/we see a small example of a report please? If it's HTML perhaps you could upload it to some free webspace?
SebastianG
http://homepages.uni-paderborn.de/sgeseman/mp3d5.jar (java command line tool)

Feel free to hack the source (it's included). Once you've downloaded this file and you've installed a Java Runtime Environment you can start it like this:

java -jar mp3d5.jar <somemp3file>

It computes an MD5 hash of all the frame's main data sections (excluding VBR header frame). It is thus resistent to tag changes as well as gain changes via mp3gain and can be used to identify mp3 files (not to check their integrity since not all the data is processed).

edit: There's now an option to compute a "quick" hash (over first 1148 audio frames which is approximately 30 seconds at MPEG1, 44kHz). The checksum will be different of course.
java -jar mp3d5.jar <somemp3file> quick

Sebi
Florian
QUOTE (hsc @ Mar 9 2006, 01:21 PM)
I'm using mp3tag now for quite some time (great tool), but have never seen that feature. I just tried it and it seems to work quite well. I was just struggling with the "One file per directory" option. How is this supposed to work? What I expected was, that I find find a mp3tag.html file in each subdir next to the mp3 files, but the tool wrote n-times the file in the current working directory, which overwrote all but the last instance (using version 2.34a).

You have to tell Mp3tag tor store the export files in the subdirectories with a format string like
CODE
%_directory%\%_directory%.md5
hsc
QUOTE (Synthetic Soul @ Mar 9 2006, 04:26 AM)
I don't know anything more about sn0wman's work than is visible in that thread.  Why not PM him?

If MP3Tag can export the MD5 for any file (not just LAME-encoded files, as per LAMETag) then why not load all of your files into MP3Tag, and export a list.

When you want to do a compare reload all the files again and export another list.  Then use a text file comparison tool like WinMerge to spot any differences.

Depending on the MP3Tag report it's possible you may need to parse the reports first, to ensure that only the MD5 values may change (e.g.: if the report has the current date/time on each line).

It's an option.  How would you like to see this process work?

Edit:

QUOTE (hsc @ Mar 9 2006, 12:21 PM)
I'm using mp3tag now for quite some time (great tool), but have never seen that feature. I just tried it and it seems to work quite well.
Can I/we see a small example of a report please? If it's HTML perhaps you could upload it to some free webspace?
*



MP3Tag can be customized and besides html you can also get a .cvs output style which makes parsing quite easy. With regard to the process/workflow I consider currently the following as an option:
(1) Use mp3tag to generate one reference md5 file (e.g. baseline.cvs ) per subdirectory. This would serve as a reference.
(2) For verifaction:
(2.1) Basicly repeat step (1) but with a different output filename (e.g. new.cvs)
(2.2) Start a tool (which needs to be written) which scans all subdirectories from a given starting point. For directories with a baseline MD5 file and a newly created MD5 file: Parse both and log an error message if there are diffs in the MD5 checksum.
hsc
QUOTE (Ganymed @ Mar 9 2006, 05:59 AM)
You have to tell Mp3tag tor store the export files in the subdirectories with a format string like
CODE
%_directory%\%_directory%.md5

*


Thanks, that was the information I was missing.
Florian
QUOTE (Synthetic Soul @ Mar 9 2006, 01:26 PM)
QUOTE (hsc @ Mar 9 2006, 12:21 PM)
I'm using mp3tag now for quite some time (great tool), but have never seen that feature. I just tried it and it seems to work quite well.
Can I/we see a small example of a report please? If it's HTML perhaps you could upload it to some free webspace?
*


Synthetic Soul, have a look at Mp3tag's Export Configuration Archive. The export configurations are build upon a very simple syntax, but you can almost export to every text-based file format (html playlists, ddl scripts for import in a database, RTF or LaTex) smile.gif
Synthetic Soul
I like the idea of configurable export formats.

Good work.

I've taken a look at an MTE file and it looks easy enough. Basically a template file using script to render the dynamic content with the $loop(<field>)/$loopend() calls to detirmine which parts of the template to repeat.

So a simple MD5 report file may be something as simple as:

CODE
$loop(%_filename_ext%)
%_filename_ext% %_md5audio%
$loopend()

Cool.

Edit: spelling
Synthetic Soul
NB: A better version is:

CODE
$filename(txt)$loop(%_filename_ext%)%_filename_ext% %_md5audio%
$loopend()

SebastianG's tool and MP3Tag both create different MD5s by the way. I'm not saying either is wrong, I expect they both just use differerent methods, but you can't use one to check the other.

I have written a batch file to use SebastianG's mp3d5.jar to process files and folders. Make sure you change the path to mp3d5.jar in line 13 if you test it out.

mp3d5 report:

CODE
01 - The View From The Afternoon.mp3 ce49bc6b121f1d230a74d575c29a83e8
02 - I Bet You Look Good On The Dancefloor.mp3 2a75656bc74484dbb717825484478069
03 - Fake Tales Of San Francisco.mp3 09e17dc3e4c48db360fe88b68cb26b80
04 - Dancing Shoes.mp3 0061fdd61e2ab1e456192b0902896659
05 - You Probably Couldn't See For The Lights But You Were Staring Straight At Me.mp3 3cf15c63f390b40f154d257058e81cfe
06 - Still Take You Home.mp3 382df27cdda2217ce9bd36fae8254dd6
07 - Riot Van.mp3 5c6af8525acd8a5445c6e41d6ba5e700
08 - Red Light Indicates Doors Are Secured.mp3 05c8dddb732e3cd68a187ee6b57ac971
09 - Mardy Bum.mp3 943ff028e45b3d552b965affdfbd9434
10 - Perhaps Vampires Is A Bit Strong But...mp3 17bf7d7d5cf9237ab170bbceda4e8726
11 - When The Sun Goes Down.mp3 f16f0e10cb9cb3bbef1558344f54d53b
12 - From The Ritz To The Rubble.mp3 599b8d2e29aedffbc89dde906e05409f
13 - A Certain Romance.mp3 e225c0943dc28259d0cb66e1aa7a0ef7

MP3Tag report:

CODE
01 - The View From The Afternoon.mp3 85E069638E7A99F9A844B3D434EE16FF
02 - I Bet You Look Good On The Dancefloor.mp3 B33CFC37B03D47FC2489CFE721C2DA27
03 - Fake Tales Of San Francisco.mp3 898C2FFF89CBCFC58219012D56E1E22F
04 - Dancing Shoes.mp3 D11E0796973AF5640649305F699AE886
05 - You Probably Couldn't See For The Lights But You Were Staring Straight At Me.mp3 8AD8026ED0F6ACA35A4CF3616C0188E9
06 - Still Take You Home.mp3 3E7F64A6D7E5F14CE7317884480BB7BF
07 - Riot Van.mp3 960AE2890DD628AD42E9B6C592CF3E4E
08 - Red Light Indicates Doors Are Secured.mp3 B65F06D83269AB5D2403777957B7F799
09 - Mardy Bum.mp3 502628576F9F9F95CEDD57121E52995B
10 - Perhaps Vampires Is A Bit Strong But...mp3 35ADABD52F8A6019F6282439E255A2EA
11 - When The Sun Goes Down.mp3 B661085E4B39769FFCF6ECF99773FD72
12 - From The Ritz To The Rubble.mp3 A63E178CDF898A23728E137E852DBF22
13 - A Certain Romance.mp3 FB92FA86C7F8073358CF507B34EBA1EC
Florian
QUOTE (Synthetic Soul @ Mar 9 2006, 04:43 PM)
SebastianG's tool and MP3Tag both create different MD5s by the way.  I'm not saying either is wrong, I expect they both just use differerent methods, but you can't use one to check the other.
Yes, this is probably because Mp3tag includes the VBR header in the hash (because it isn't specific to MP3 - you can use it for Musepack or MP4 AAC too) smile.gif
SebastianG
mp3d5 only covers the main data sections of audio frames. The intended use is identification while being robust to changes of the meta data (including global gain factors, private bits, VBR header frame).

To speed things up I suggest using the "quick hash" mode. This will produce other hash values but it's a lot faster. It only covers the first part of the file but it still serves its purpose IMHO.

Edit: The main class is only about 70 lines of simple code. I think it's easy to add functionality (like directory scanning & writing to one file) If you've already written a hello world app in Java. smile.gif
http://homepages.uni-paderborn.de/sgeseman/Mp3d5.java.txt

Sebi
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.