AR confidence and one person's opinion of 'best practices'

Topic: AR confidence and one person's opinion of 'best practices' (Read 14408 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

AR confidence and one person's opinion of 'best practices'

2010-01-29 04:13:10

Quote from: Axmann on 2010-01-28 08:08:50

In your answers, give absolutely NO regard to filesize, compatibility, or time needed to encode/decode.

Which format gives you THE highest quality sound you can squeeze out of a CD?

A music CD, EAC-ripped, test and copy checksum matched, accuraterip w/ confidence level above 5 (I say 5 for me because I seem to have alot of rare or import versions), ripped to WAV, and then encoded with any lossless encoder of your choosing. Getting the music off the CD accurately is of as much importance as the lossless encoder you choose afterwards.

AR confidence and one person's opinion of 'best practices'

Reply #1 – 2010-01-29 19:10:39

Quote from: hellokeith on 2010-01-29 04:13:10

accuraterip w/ confidence level above 5 (I say 5 for me because I seem to have alot of rare or import versions)

I'd like to see your justification on technical grounds for choosing 5 irrespective of the rarity of your titles. Did you come up with that number from rolling a die or something?

AR confidence and one person's opinion of 'best practices'

Reply #2 – 2010-01-29 21:38:04

Quote from: greynol on 2010-01-29 19:10:39

Quote from: hellokeith on 2010-01-29 04:13:10
accuraterip w/ confidence level above 5 (I say 5 for me because I seem to have alot of rare or import versions)

I'd like to see your justification on technical grounds for choosing 5 irrespective of the rarity of your titles. Did you come up with that number from rolling a die or something?

What problems do you have with this number? It's huge.
Chance that selected 5 people had wrong releases, yet had the same results is c.a. 1/((2^32)^5). Actually slightly higher because the number assumes that no 2 of them have exactly the same scratches.
That's for any choice of 5 people and there is a problem if such 5 exists, so because of the birthday paradox, much smaller in reality. I don't care enough to calculate it, it's still huge.

AR confidence and one person's opinion of 'best practices'

Reply #3 – 2010-01-29 21:46:57

I have a problem because the number has no technical basis.

Your assumption about the database and scratches is an underestimation since it has already been shown on more than one occasion that the database has given false positives on consistent errors (due to either firmware defect, manufacturing defect, software defect, or any combination of the three). I'd also like to add that the hash values are far more prone to collision than your math suggests.

AR confidence and one person's opinion of 'best practices'

Reply #4 – 2010-01-29 22:21:17

Quote from: greynol on 2010-01-29 21:46:57

I have a problem because the number has no technical basis.

Your assumption about the database and scratches is an underestimation since it has already been shown on more than one occasion that the database has given false positives on consistent errors (due to either firmware defect, manufacturing defect, software defect, or any combination of the three).

OK. I'm not a technical person, I believe your words.

Quote from: greynol on 2010-01-29 21:46:57

I'd also like to add that the hash values are far more prone to collision than your math suggests.

But I'm a mathematician and I'd like to see some details here. It's not that I believe that it's has function is perfect, but your statement is very bold.

AR confidence and one person's opinion of 'best practices'

Reply #5 – 2010-01-29 22:28:49

Quote from: _m²_ on 2010-01-29 22:21:17

But I'm a mathematician and I'd like to see some details here. It's not that I believe that it's has function is perfect, but your statement is very bold.

http://www.hydrogenaudio.org/forums/index....showtopic=61468

AR confidence and one person's opinion of 'best practices'

Reply #6 – 2010-01-29 22:36:56

Quote from: greynol on 2010-01-29 22:28:49

Quote from: _m²_ on 2010-01-29 22:21:17
But I'm a mathematician and I'd like to see some details here. It's not that I believe that it's has function is perfect, but your statement is very bold.

http://www.hydrogenaudio.org/forums/index....showtopic=61468

Thank you, I'll read it later.

AR confidence and one person's opinion of 'best practices'

Reply #7 – 2010-01-30 00:00:46

I really shouldn't have said far more, but it is not as good as the 1 in 4 billion figure that was commonly thrown around. I am by no means even close to being an authority on cryptography, but summing the amplitude of each sample multiplied by its address and discarding the most significant half of the bits doesn't seem to me as being all that robust.

So long as these other exceptions aren't occurring, you only need an AR confidence of 1 to be reasonably certain your rip was error-free, if it was not your previous submission. If it is then it's on the same level of getting matching T&C CRCs, assuming the AR hash is robust enough. If the confidence of 1 is your previous submission, then 2 gets you back to reasonable certainty, assuming that second submission isn't also yours after getting a new AR ID number from reinstalling your operating after your previous submission (as an example).

If you get positive AR verification, then there is no need to have EAC generate a second CRC (another small issue I have with hellokeiths' post). If you are going to generate a second CRC, it's best to do it with a drive based on a completely different chipset in order to help rule out one of the issues I raised a couple of posts back. However, I do agree with the point he's trying to make, which is that ripping accuracy means something whereas the lossless format you choose, based on the way the OP's posed the question, is completely irrelevant; lossless is lossless.

AR confidence and one person's opinion of 'best practices'

Reply #8 – 2010-01-30 00:19:45

As an aside prior to responding, I have noticed that historically you are quite passionate about AR. I do appreciate that, and my opinion of you is that you are a very intellectually honest person.

Quote from: greynol on 2010-01-29 19:10:39

I'd like to see your justification on technical grounds for choosing 5 irrespective of the rarity of your titles. Did you come up with that number from rolling a die or something?

The number is based on three things:

First, my ripping best practices. If at all possible, I only purchase new CD's, and I rip w/ EAC first thing out of the shrink wrap. If purchased used or a CD that I purchased new but have had for a while, I carefully examine and remove any foreign substances from the surface. If it is very dirty or badly scratched, I run it through my SkipDR until I am convinced it is as clean and resurfaced as it is ever going to get. For new CD's out of shrink wrip, my expectation is that the AR confidence number, if I get one, is sufficient because there was no opportunity of my doing for the CD to get dirty or scratched. For used / scratched CD's, if I'm getting low confidence numbers, I will do 2 or 3 rips at different modes and compare.

Second, my experience with AR. That is an average confidence level I get with popular CD titles (I don't have many I would consider popular, i.e. radio airplay, availability in retail outlets, etc.) and a (much lower) average confidence level I get with the rest (majority) of my CD's.

Third and last, the guidance from the dBpoweramp web page:

Quote

When AccurateRip is operating it will report a message next to a track such as 'Accurate (12)': this reports your rip matched 12 other peoples rips (the confidence number), anything above a confidence of 1 can be relied upon.

Anything above a confidence of 1 can be relied upon. That along with my ripping best practices and historical averages makes me feel pretty good about 5 or higher.

AR confidence and one person's opinion of 'best practices'

Reply #9 – 2010-01-30 00:25:56

So 5 is just a warm and fuzzy feeling for you and is not based on any technical understanding of AccurateRip.

Regarding the confidence of 1, I've already addressed it and you will see Spoon say exactly the same thing here.

The problem is that everyone wants a hard number, and there is no hard number to give. Another problem is that people want to over-simplify (as in the dbpoweramp web page). My problem is when someone like yourself presents a hard number, others run with it as if it were the truth. People don't seem to want to think critically about these things and then the misinformation spreads. It's a pet peeve of mine, which is why I tend to be a dick about these things.

It would have been just as easy to say that all you need is a confidence of 1 provided it wasn't your submission, while continuing to ignore the other exceptions which guessing at how high to increase the confidence level above 1 would not properly address anyway.

AR confidence and one person's opinion of 'best practices'

Reply #10 – 2010-01-30 07:28:00

Hmm...I'm hugely disappointed with AccurateRip now. No matter what confidence level, it can't give more than 97% correctness. It really shouldn't be relied upon. I expected at least five nines. The topic linked is quite old, I guess that the author didn't fix it, right?
Thanks again for the link, it's a huge blow, but it's certainly good to be aware.

AR confidence and one person's opinion of 'best practices'

Reply #11 – 2010-01-30 07:32:28

Quote from: greynol on 2010-01-30 00:25:56

So 5 is just a warm and fuzzy feeling for you and is not based on any technical understanding of AccurateRip.

Now slow down there Tex. I have a basic understanding of how AccurateRip works, otherwise I wouldn't go to the trouble of using it as a basis for some rather time-intensive rip and re-rip and re-rip again sessions on a handful of favorite CD's which are scratched to high hell because I played them a thousand times over the years.

In the thread you linked, Spoon said:

Quote

at a 97% coverage, I stand behind AccurateRip @ 97% is better than most (? all) c2 implementations

Has Spoon since recanted this? Has this number dropped substantially? Have we thrown out the Test & Copy procedure crc's AND thrown out the confidence level? Is there some new math which shows that all these different optical drives on different PC's scattered across the globe used in rips done by different people at different times (years apart even) actually just ended up giving the exact same CRC values multiple times? If so, why is anyone even still using AR?

AR confidence and one person's opinion of 'best practices'

Reply #12 – 2010-01-30 08:04:12

Hopefully not too much OT:
Will a new (different) glassmaster change AR data ?
Mastering is part of my job and I have to prepare a few old cd-masters for replication in a new cd-factory.
Although I'm mostly intrested in audio data integrity, it doesn't harm to get the non-audio details right.
Do LBRs (laser beam recorders) have a consistent offset or does this depend on individual calibration ?
(I'm sending cd-masters in the DDP image file format, so no write-offsets in my part of the chain)

AR confidence and one person's opinion of 'best practices'

Reply #13 – 2010-01-30 13:03:48

Quote from: _m²_ on 2010-01-30 07:28:00

Hmm...I'm hugely disappointed with AccurateRip now. No matter what confidence level, it can't give more than 97% correctness.

That is NOT what was said in the linked thread. I thought you were a mathematician?

AR confidence and one person's opinion of 'best practices'

Reply #14 – 2010-01-30 15:32:38

Quote from: Soap on 2010-01-30 13:03:48

Quote from: _m²_ on 2010-01-30 07:28:00
Hmm...I'm hugely disappointed with AccurateRip now. No matter what confidence level, it can't give more than 97% correctness.

That is NOT what was said in the linked thread. I thought you were a mathematician?

There's always 3% of my data that is not included in the checksum, isn't that what was said there? If there's exactly one error than it has 3% of not being detected. That's what I meant saying the words above. There's also some probability that there are multiple errors, but it depends on data that I don't have, so I left it out.
And probability that there's incorrect value, which accidentally matched. It should be almost none, but I stopped assuming that the CRC code is fine.
OK, the most useful metric would be probablility of error in case that AR returns 'It's fine'. But that depends on probability of ripping errors and therefore we won't get it.

AR confidence and one person's opinion of 'best practices'

Reply #15 – 2010-01-30 15:37:43

Your full quote was:

Quote from: _m²_ link=msg=0 date=

Hmm...I'm hugely disappointed with AccurateRip now. No matter what confidence level, it can't give more than 97% correctness. It really shouldn't be relied upon. I expected at least five nines. The topic linked is quite old, I guess that the author didn't fix it, right?
Thanks again for the link, it's a huge blow, but it's certainly good to be aware.

Correct me where I'm wrong, but three percent of the data not being used in the "CRC" calculation is no where near the same thing as the confidence level having "97% correctness".
The collision rate being higher than it should be is not "a huge blow" much less your conclusion that "it really shouldn't be relied upon".

AR confidence and one person's opinion of 'best practices'

Reply #16 – 2010-01-30 16:03:05

Quote from: Soap on 2010-01-30 15:37:43

Your full quote was:
Quote from: _m²_ link=msg=0 date=
Hmm...I'm hugely disappointed with AccurateRip now. No matter what confidence level, it can't give more than 97% correctness. It really shouldn't be relied upon. I expected at least five nines. The topic linked is quite old, I guess that the author didn't fix it, right?
Thanks again for the link, it's a huge blow, but it's certainly good to be aware.

Correct me where I'm wrong, but three percent of the data not being used in the "CRC" calculation is no where near the same thing as the confidence level having "97% correctness".
The collision rate being higher than it should be is not "a huge blow" much less your conclusion that "it really shouldn't be relied upon".

Bad wording on my side. I meant "it can't warrant more than 97% correctness". I take is as my fault and will correct the message above.
I stand by "it really shouldn't be relied upon".
Actually I wish sb. else wrote something like AR. Because I don't think that even if it's ever fixed, I'll trust there are no more as bad bugs in AR.

ADDED: Because of this forum policy, I can't fix the message above. Can any mod do it for me, please?

AR confidence and one person's opinion of 'best practices'

Reply #17 – 2010-01-30 16:43:40

Actually, the chance of AccurateRip not detecting an error is much much less than 3%. I'd say it's pretty close to 1/2^32, but that would be only a guess.

You forget to take into account some important factors.

First, CD has an error correction algorithm, which reorders bits. If only several bits in the sector were read incorrectly, error correction will fix that. If more bits were read incorrectly, the resulting sector will have many invalid non consequent bits, which would definitely affect AccurateRip CRC.

Second, if your rip matches several pressings with different offsets, the probability of non-detected error decreases with each such pressing, so even if it was 3%, it would be 0.09% with two matching pressings.

AR confidence and one person's opinion of 'best practices'

Reply #18 – 2010-01-30 16:52:07

Quote from: Gregory S. Chudov on 2010-01-30 16:43:40

If more bits were read incorrectly, the resulting sector will have many invalid non consequent bits.

Doesn't sound good. I ask for more details.

Quote from: Gregory S. Chudov on 2010-01-30 16:43:40

Second, if your rip matches several pressings with different offsets, the probability of non-detected error decreases with each such pressing, so even if it was 3%, it would be 0.09% with two matching pressings.

I've never seen AR returning information about number of pressings matched. But I don't think it matters anyway - (now goes a 100% guess from my side) AR probably always skips the same 3%.
If error (all errors) is skipped, it doesn't matter how many pressings does it match.

AR confidence and one person's opinion of 'best practices'

Reply #19 – 2010-01-30 17:00:11

Quote from http://en.wikipedia.org/wiki/Cross-interle...-Solomon_coding : CIRC corrects error bursts up to 3,500 bits in sequence

When you have more than 3,500 erroneous bits, the chance of all of them falling into 3% that are left out by ArCRC is astronomically small.

And if you use cuetools or fb2k's verifier, you get information about other pressings. They don't skip the same 3%.

AR confidence and one person's opinion of 'best practices'

Reply #20 – 2010-01-30 17:18:08

Implementation failure is only on the right channel. Please describe a read error style which does not impact both channels.

AR confidence and one person's opinion of 'best practices'

Reply #21 – 2010-01-30 17:22:12

Quote from: Gregory S. Chudov on 2010-01-30 17:00:11

Quote from http://en.wikipedia.org/wiki/Cross-interle...-Solomon_coding : CIRC corrects error bursts up to 3,500 bits in sequence

When you have more than 3,500 erroneous bits, the chance of all of them falling into 3% that are left out by ArCRC is astronomically small.

Interesting read. But not specific enough and doesn't answer my primary concern (not expressed well in the previous post...):
Let's assume one has 3501 errors. What's left after correction? Can it be 1 (possibly approximated) error?

Quote from: Gregory S. Chudov on 2010-01-30 17:00:11

And if you use cuetools or fb2k's verifier, you get information about other pressings. They don't skip the same 3%.

So AR calculates a separate checksum several times, right? If yes, it's fine.

ADDED:

Quote

Implementation failure is only on the right channel. Please describe a read error style which does not impact both channels.

Prove there is none.

AR confidence and one person's opinion of 'best practices'

Reply #22 – 2010-01-30 17:37:28

Quote from: _m²_ on 2010-01-30 17:22:12

Quote
Implementation failure is only on the right channel. Please describe a read error style which does not impact both channels.

Prove there is none.

A - For my next trick I'll prove the non-existence of God.
B - I don't need to. You're the one making the shaky claim.
C - Not even looking for "proof". Just show, knowing how the data is written on a CD, how damage/read failures could escape impacting both channels.

AR confidence and one person's opinion of 'best practices'

Reply #23 – 2010-01-30 17:48:56

From "Inconvenient Facts About Hydrogenaudio":

Quote

No. 3: The inflow of douche never terminates and it can only be controlled to a certain extend.

AR confidence and one person's opinion of 'best practices'

Reply #24 – 2010-01-30 17:52:48

Quote from: Soap on 2010-01-30 17:37:28

Quote from: _m²_ on 2010-01-30 17:22:12
Quote
Implementation failure is only on the right channel. Please describe a read error style which does not impact both channels.

Prove there is none.

A - For my next trick I'll prove the non-existence of God.
B - I don't need to. You're the one making the shaky claim.
C - Not even looking for "proof". Just show, knowing how the data is written on a CD, how damage/read failures could escape impacting both channels.

I claim that you can't give any serious warranties on AR being secure. (God related ones are not serious for me.)
You seemed to be wanting to contradict it and I don't see how can you do it w/out actually giving them and proving seriousness.

Notice