Help - Search - Members - Calendar
Full Version: 82.5% of my rips are "perfect"
Hydrogenaudio Forums > CD-R and Audio Hardware > CD Hardware/Software
Zennon
Perhaps my recent post over at the EAC forum is better placed here...

http://www.digital-inn.de/exact-audio-copy...ps-perfect.html
Zennon
QUOTE(Zennon;132129)
Ripped music provides an interesting data source for analysis, especially when EAC log files and CUE sheets are available. I thought I should share some results from a preliminary analysis of my rips.

My objective was to assess how "perfect" my rip archive is. I consider a rip as perfect if all the audio data (including analogue noise) on a CD has been extracted without errors. I’m not concerned with replicating the exact same numbers of silent samples at the beginning/ending of a CD, or with extracting the exact sub-channel data (CD text, etc). For simplicity, the below ignores enhanced and copy-protected discs.

So far I have ripped 813 pressed discs, using various EAC versions (up to v0.95b3). My main drive is a LiteOn LTR-48246S (read offset -6 and write offset -6). I use FLAC to compress the WAVs.

I rip my CDs offset-corrected (as per Andre’s reference, not IpseDixit’s) and track-based (gaps appended to next track) so I can use EAC’s test & copy in burst mode and take advantage of AccurateRip. I rerip a disc in secure mode if one or more CRC checksums fail. I keep the EAC log file and extract a (non-compliant) CUE sheet after ripping. In addition, I keep MD5 checksums for all WAVs as well as the AccurateRip report.

Before ripping a disc, I check if a pregap before track 1 exists (F4) and if so, whether it’s silent or not (F3). If the latter is the case, I rerip track 1 index-based, add the resulting 01.00 WAV to the existing tracks and manually edit the CUE sheet to reflect its presence. 156 of my 813 discs have a pregap, 20 of which are non-silent. Typically, the length of the pregap is 0:00:00.42 (58%) or 0:00:00.44 (33%).

So how "perfect" is my rip archive? Conceptually, there are three possible reasons for a rip to be "imperfect", each of which requires a test. If a rip passes all three tests, it is perfect as per my definition above.

TEST #1 - Rip errors. EAC can’t extract all the audio and reports fatal read errors. Unfortunately there’s not much one can do about this, so if the log file reports errors the rip is not "clean" and fails this test. I encountered unrecoverable errors on 13 discs.

TEST #2 – Non-silent pregap before track 1 (or hidden track one audio - HTOA) that can't be extracted. If I encounter a non-silent pregap before track 1, I rebuild the CD image from the track-based files (including the 01.00 file) and then rerip the original CD as an image with both my LiteOn and my secondary drive, a LG GSA-4163B (which is HTOA-capable). A rip passes the pregap test if MD5 checksums for all three images match. Because the LG has a read offset of -667 samples, this does not work for images with <=667 silent end samples, in which case I happily accept a match between checksums of the rebuilt image and the LiteOn rip. So far, only one disc has failed this test. However, I managed to rip its 02:02:61 HTOA with the LG drive and was able to match checksums of the rebuilt image and the LG rip. Thus, so far I have a zero fail rate here.

TEST #3 - Non-silent begin/end samples. The first and/or last WAV do(es) not begin/end with digital silence. If a pressed CD has no "skew", an (offset-corrected) drive with an offset of -x may have missed up to x non-silent end samples on the CD if the last WAV ends with x silent samples. However, most commercial CD’s are actually skewed (sometimes by 10,000s of samples). This results in an "audio window" (including silent begin/end samples present in the audio master) that is shifted either to the left or to the right, spilling over into the lead-in or lead-out, respectively. This is not a problem if it’s only digital begin/end silence that’s spilling over. There are (rather cumbersome) ways to extract missing audio data from discs where non-silent samples are spilling over, but conducting and testing these are beyond the scope of my analysis. My LiteOn has a read offset correction of +6 samples, so I test the end of the last WAV for digital silence. If it has more than 6 samples of digital end silence and the first WAV starts with digital silence (1 or more samples), then the rip passes the silence test.

Of my 813 rips, 13 fail the rip errors test, 0 fail the pregap test, and 131 fail the begin/end samples test. These numbers indicate that – as expected – the "skew" of pressed CDs is the primary cause for rips to be imperfect.

There are 142 rips failing one or more tests. This leaves 671 perfect rips, or 82.5% of my rip archive.

I’d be keen to hear if others have conducted similar tests and, if so, what their findings are. Feedback on my rationale would also be much appreciated.


The above strategy for testing my rip archive deliberately omitted AccurateRip because I had always regarded AR results as static, i.e. the AR database is queried at the time the audio is extracted and that's it. I have now re-checked all rips with ARCue.exe, with quite amazing results: the percentage of discs that were accurately ripped is now up from ~20% to 80%.

Obvioulsly, there will never be a definitive answer as to whether a rip is 'perfect' or not (unless one has access to the studio master). Given the nature of DAE, 'perfectness' will always be a matter of probability, and the AR database was designed around that very idea.

I'm now looking to integrate the updated AR results into the approach quoted above - an additional TEST #0 should deal with the AR results in some way, but there are a few problems:

* there are ~150 rips that were 'accurately ripped' with (average) confidence 1. Most likely I'm getting my own rip results back here because the bulk of my rips were done under a previous OS installation (same machine and drives). Obviously, letting accurate rips with confidence 1 pass TEST #0 is not a good idea.

* an interesting finding is that 7 of the 8 'not accurate' rips have 100% matching CRCs in the EAC log. Three of those were ripped in secure mode (Test & Copy), the other four in burst mode (Test & Copy). These results could indicate false positives (possible even with EAC's secure ripping mode) and/or consistent errors. I'm happy to accept those if the AR confidence is, say, 5 or more. But with lower confidence it becomes hard to say which rip is the right one - mine or the rip(s) stored in the AR database?

Again, the conclusion is that there's NO certainty in the wonderful world of DAE.. ;-(
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.