Mar 31 2010, 02:27
Joined: 16-October 03
Member No.: 9337
I have only now become aware of Gregory S. Chudov's effort to develop CTDB (CUETools DB). I am very excited about this as I have actually been suggesting this to spoon (dbpoweramp's developer for over 5 years)
for others that have missed it:
What's it for?
You probably heard about AccurateRip, a wonderfull database of CD rip checksums, which helps you make sure your CD rip is an exact copy of original CD. What it can tell you is how many other people got the same data when copying this CD. CUETools Database is an extension of this idea.
What are the advantages?
* The most important feature is the ability not only to detect, but also correct small amounts of errors that occured in the ripping process.
* It's free of the offset problems. You don't even need to set up offset correction for your CD drive to be able to verify and what's more important, submit rips to the database. Different pressings of the same CD are treated as the same disc by the database, it doesn't care.
* Verification results are easier to deal with. There are exactly three possible outcomes: rip is correct, rip contains correctable errors, rip is unknown (or contains errors beyond repair).
* If there's a match, you can be certain it's really a match, because in addition to recovery record database uses a well-known CRC32 checksum of the whole CD image (except for 10*588 offset samples in the first and last seconds of the disc). This checksum is used as a rip ID in CTDB.
What are the downsides and limitations?
* CUETools DB doesn't bother with tracks. Your rip as a whole is either good/correctable, or it isn't. If one of the tracks is damaged beyound repair, CTDB cannot tell which one.
* If your rip contains errors, verification/correction process will involve downloading about 200kb of data, which is much more than it takes for AccurateRp.
* Verification process is slower than with AR.
* Database was just born and at the moment contains much less CDs than AR.
How many errors can a rip contain and still be repairable?
* That depends. The best case scenario is when there's one continuous damaged area up to 30-40 sectors (about half a second) long.
* The worst case scenario is 4 non-continuous damaged sectors in (very) unlucky positions.
What information does the database contain per each submission?
* CD TOC (Table Of Contents), i.e. length of every track.
* Offset-finding checksum, i.e. small (16 byte) recovery record for a set of samples throughout the CD, which allows to detect the offset difference between the rip in database and your rip, even if your rip contains some errors.
* CRC32 of the whole disc (except for some leadin/leadout samples).
* Submission date, artist, title.
* 180kb recovery record, which is stored separately and accessed only when verifying a broken rip or repairing it.
May 10 2010, 08:06
Joined: 4-May 08
Member No.: 53282
Well the two problems cases are very different:
- in the first problem case the AR database detects nothing wrong while the full checksum for the first track are different within the same pressing, this means that the problem is located inside the 5 frames ignored by AR.
This case have nothing to do with CTDB, I haven't tried CTDB on it, but I have deleted the bad doublons so I can't test anymore. Within all logic CTDB would have been of no help here because the difference being located ouside of the reach of the AR checksum it can only be located outside of the reach of the CTDB checksum (As CTDB ignores 5 added frames compared to AR.)
In this first problem case, I have only losslessly splitted the 5 first frames (as you've noticed it was painfull) & compared the Flac MD5 checksum which was different (The EAC wav comparator also detected differences). I cannot hear any difference (5 frames too too short to ABX anyway) & when I zoom on the 5 frames with any audio editor all I see is flat.
My first idea was naturally to think about a scratch, but I wasn't sure so I asked for advice & you told me that this was possibly due to an -472 offset problem due to overread. My skill with an audio editor to obtain the same result were too weak. But I think that the explanation is logic & I know how serious you can be so I tend to trust you.
But IMHO this second case is slightly different, maybe you're right & maybe this is an even bigger overread into the lead-in problem ... but as far as I understund the 5 frames ignored by AR are already there to avoid this kind of problem. So the problem is beyond the safety margin which means that this time it is more likely to be a simple scratch. I have already deleted the un-corrected file so I cannot verify (but even if I would still have it the first problem case showed that I didn't knew how to do)
Anyway all this doesn't really matter what I wanted to illustrate is that it is possible to have a scratch detected by AR by that cannot be corrected by CTDB because of the difference between the length of ignored frames.
Differences between pressings can vary by more than just the offset at the edges. For this reason, extending the area of what is ignored can actually be a good thing.
I know that pressings can vary by a lot more than just the offset, but I don't understand the second part of the sentence, I don't see why it would be a good thing to extend the ignored area when this area is already enougth. There is not offset bigger than 5 frames in the database & I even think that 4 frames is already enougth if you only judge by drive in the AR database. 1 frame is not a big deal & I know Spoon chosen 5 frames back when there was virtual drive with big offset in his database which have been purged. A 1 frame margin is also eventually good as an added margin. But Greg adds 5 more frames this sounds enormous to me, he simply doubled the margin of Spoon. I don't understand why so I just ask. Maybe I am wrong.
I admit I haven't investigated the EAC bug, but from what I understand of it, if it's just 2 reapeated frames at the end, it only pushes the margin down to 3 frames at the beginning, even in this case you really need an offset higher than 3 frames to get in trouble. Even with this bug I can understand a choice of 5+2=7 ... but going up to 10 is overkill. Even so, I don't think that bug should affect the technical choice behind CTDB.
I admit, it's easier to attribute it to a scratch , but in this case it has a big chance of being a scratch IMHO.
This post has been edited by sauvage78: May 10 2010, 08:10
|Lo-Fi Version||Time is now: 18th June 2013 - 08:38|