Does subtracting MP3 from WAV reveal artifacts?

May I assure you that there no practical way to encode an mp3 file better than with a standard phase-preserving quantization of coefficients of a phase-preserving filter bank.

So you are saying that the mp3 standard disallows the use of parametric encoding?

Does subtracting MP3 from WAV reveal artifacts?

Reply #27 – 2010-11-07 19:14:26

Quote from: krabapple on 2010-11-07 08:58:39

Quote from: pdq on 2010-11-07 17:21:38
There is nothing in the mp3 standard that says that you have to do anything of the kind. If current implementations DO preserve timing and phase then that may be because it is easiest to implement that way. If someone discovered a way to encode to mp3 such that the timing and phase were not preserved, but the result was peceptually transparent at lower bitrates, then everybody would switch to doing it that way.

May I assure you that there no practical way to encode an mp3 file better than with a standard phase-preserving quantization of coefficients of a phase-preserving filter bank.

Providng a mathematical proof of that would probably be a pretty good master's thesis project... ;-)

Does subtracting MP3 from WAV reveal artifacts?

Reply #28 – 2010-11-07 19:21:16

Quote from: JapanAudio on 2010-11-07 05:37:59
I am not disputing this: To assess or compare lossy compression performance, you NEED to listen to the lossy files because it is difficult to determine what kind of noise may or may not be heard combined with the main track. I am only saying that listening to the discarded noise reveals the exact artifacts that are introduced by the encoder.

So what? It doesn't tell you whether they are audible in context.

Where's the line between audible and inaudible? ...in context? If I can't hear you a mile away, how about right in front of me... in the context of a jet engine running?
Lossy has the purpose of achieving a target bitrate and getting acceptable signal quality for the intended use. For the application of music, people have decided to go for 'perceived human hearing' as a model. Perhaps you'll be surprised to know that the MP3 standard doesn't specify any normative implementation of the psyacoustic model; It's all up to the coder to decide.

With music we go for high fidelity, namely the most exact rendition of a recording. Is it justified to retract noises that are not naturally prone to being perceived by the human ear? Yes, because this is the best compromise to achieve target bitrate and retain quality. Is this a hifi process? Not really (in my definition), because these noises are not given a chance, they're simply dulled out when normally they would sparkle (high pitch) through the audio. So if you want to know exactly what kind of sparkle is being taken away from your music, i suggest hearing them alone. Note that it might not give you insight on subjective performance, but it does reveal exactly what is left out. I will post some audio samples so you can really get a feel for it.

Quote from: pdq on 2010-11-07 17:21:38

Quote from: Alexey Lukin on 2010-11-07 08:10:09
I disagree!
Mp3 encoding does preserve timing and phase information. Unlike parametric coders, Mp3 strives to preserve the waveform, including its phase.

There is nothing in the mp3 standard that says that you have to do anything of the kind. If current implementations DO preserve timing and phase then that may be because it is easiest to implement that way.

If someone discovered a way to encode to mp3 such that the timing and phase were not preserved, but the result was peceptually transparent at lower bitrates, then everybody would switch to doing it that way.

In fact, the only thing that the mp3 standard specifies is how to decode an mp3 file to wav.

Phase and timing aren't the same thing... Phase cannot be preserved, because you add noise that's reflected in both time and freq domains. But timing doesn't change with respect to this noise.

About your last sentence... I have the ISO/IEC 11172-3 standard right in front of me, and... hmm... it's not true. Decoder architecture is only part of the standard, amongst other parts like Encoding and Storage.

Does subtracting MP3 from WAV reveal artifacts?

Reply #29 – 2010-11-07 19:35:35

Your jet engine analogy shows that you understand the concept of masking, but you then go back to your misconceptions...

....they're simply dulled out when normally they would sparkle (high pitch) through the audio.

No, they wouldn't normally sparkle. You can't hear them in the context of the rest of the music, because they would be masked. If you can hear them in the original but detect a loss of "sparkle" in the encoded version, then output from the lossy codec would be easily detectable in an ABX test. There's no point retaining something that your ear can't hear - that's the basis of psychoacoustic modelling in lossy codecs.

So if you want to know exactly what kind of sparkle is being taken away from your music, i suggest hearing them alone.

Great, but it doesn't tell you anything because you are ignoring the signal that is masking the "sparkle".

Note that it might not give you insight on subjective performance

Finally, you have got it. As has been said numerous times in this and other threads, listening to the difference gives you absolutely no insight into the subjective performance of the lossy codec. Only an ABX test can do that.

Does subtracting MP3 from WAV reveal artifacts?

Reply #30 – 2010-11-07 19:39:49

Quote from: Arnold B. Krueger on 2010-11-07 19:06:32

May I assure you that there no practical way to encode an mp3 file better than with a standard phase-preserving quantization of coefficients of a phase-preserving filter bank.

Exactly. Keeping phase info is not desirable.

Quote from: greynol on 2010-11-07 18:37:55
I'd like to see DVDdoug defend his comment regarding phase response.

I'd like to hear some comments from people who have actually taken say a 44/16 400 Hz square wave .wav file, converted it into a 128 kb MP3, and then compared the reconstructed .wav file to the original wav file.

There is a fatal flaw to this experiment, because a square wave cannot be coded to PCM. The max possible freq in 44.1kHz is 22050 Hz, and a perfect square wave has infinite freq (impulse) at its edges. Therefore the signal is aliased.

So you can't sample a non-bandlimited signal (eg. square wave) without serious aliasing unless prefiltering it with a lowpass @ 22050 (smooth it). In this case the experiment would be more reliable, but your signal would be a smoothed out square wave and could still be a bit aliased depending on the nature of your prefilter and its performance. All mathematical operations from there would be valid.

To visualize what kind of waveform your seemingly perfect square wave ought to be, you can ideally interpolate it with the sinc function. You will inevitably notice a lot of ringing near the edges (equivalent to ghosting in imaging).

Does subtracting MP3 from WAV reveal artifacts?

Reply #31 – 2010-11-07 20:16:49

Quote from: Ouroboros on 2010-11-07 12:18:25

Quote from: HibyPrime on 2010-11-07 05:43:04
Thats the goal of psychoacoustics, the fact that many people claim to hear the difference between lossless and lossy says that they haven't quite achieved that goal yet - at least at 320Kbps and under.

That's an incorrect generalisation - it's only true for a very few killer samples, and for a few people. In general, modern lossy codecs (MP3 / AAC / OGG) achieve perceptual transparency at much lower bit rates for the vast majority of people and for the vast majority of music. That's what all of the properly conducted tests have revealed, and that's why many people who do their own tests settle on LAME -V2 (around 220 kb/S) or even lower for their MP3 encoding.

Quote from: HibyPrime on 2010-11-07 05:43:04
When I started ripping/downloading all of my music in FLAC, I only did it because I had just a gotten bigger hard drive, and saw no reason to keep ripping in MP3. I ran ABX tests in foobar and would consistently fail to tell a difference between +192Kbps MP3s and FLAC. Over time I started to notice things I've never heard before in tracks I've had for a long time, but I would still fail ABX tests. At this point you would assume that the only difference is all in my head, but when I heard the difference file in the youtube link it sounds a lot like the sounds that I occasionally notice in FLAC files. Note I'm referring to the lower volume part of the clip, once it hits the louder part it just sounds like a mess.

It is illogical to claim that the difference file on Youtube sounds like the sounds you occasionally notice in FLAC files. If you can't ABX it then you clearly can't notice the difference, so it's all in your head i.e. it's a placebo effect.

Again, the psychoacoustic models in lossy codecs exploit the way your ears and brain perceive sound, and ABX is the only generally available method that measures your perception of the sound. Looking at the difference file, or listening to it in isolation, tells you nothing.

Actually, it is a perfectly logical statement. If the goal of the MP3 encoder is to mask the sounds that you (hopefully) would never hear anyway, it is logical to assume that pin pointing those sounds in an ABX test would be nearly impossible.

Not being able to pin point something does NOT mean it doesn't exist.

This is starting to look like something that will break TOS#8, so I'll put this disclaimer here; my argument is that the mathematical difference between a WAV and MP3 file of the same song can help to reveal the differences between lossy and lossless. I am not arguing that either one is better.

Does subtracting MP3 from WAV reveal artifacts?

Reply #32 – 2010-11-07 20:18:43

Quote from: pdq on 2010-11-07 19:10:30

So you are saying that the mp3 standard disallows the use of parametric encoding?

Yes, to the best of my knowledge.

Quote from: Arnold B. Krueger on 2010-11-07 19:06:32

Phase and timing aren't the same thing... Phase cannot be preserved, because you add noise that's reflected in both time and freq domains. But timing doesn't change with respect to this noise.

Well, phase is impacted by quantization noise just as much as amplitude. However there's no preference in mp3 for preserving amplitude versus preserving phase. When bit rate is high, both amplitude and phase are well preserved.

Its fair to say that phase information is neither totally lost, nor accurately preserved.

I'd say that distortion to phase information is "comparable" to distortion of amplitude information (although it's truly like apples vs. oranges).

Does subtracting MP3 from WAV reveal artifacts?

Reply #33 – 2010-11-07 20:25:11

Quote from: Ouroboros on 2010-11-07 19:35:35

Quote from: JapanAudio on 2010-11-07 19:21:16
So if you want to know exactly what kind of sparkle is being taken away from your music, i suggest hearing them alone.
Great, but it doesn't tell you anything because you are ignoring the signal that is masking the "sparkle".

If by "masking" you mean dissimulating... OK.
Keep in mind that it's not a logical "AND" mask like in computing, eg. '11001100' AND '12345678' = '12005600'; The original signal (2nd) is irretrievable using only the mask and output. (You can apply such an operation in the frequency domain to implement bandpass filters and whatnot.)

But here we're talking about a temporal masking eg. (lets take bigger values) '1 0 -2 1 3' + '100 50 75 53' = '101 50 73 56'; Here the signal is retrievable using only the mask and output. Saying that the noise is completely masked temporally is quite a stretch... though it may be well dissimulated (input and output are very similar).

It's really not surprising that many prefer CD quality over MP3s. On the other hand, not everyone has the same auditory ability, or hifi stereo chain, so in the same way its not a surprise that many people can't distinct one from the other --MP3 is designed to have well dissimulated noise!

Quote from: Ouroboros on 2010-11-07 19:35:35

Quote from: JapanAudio on 2010-11-07 19:21:16
Note that it might not give you insight on subjective performance
Finally, you have got it. As has been said numerous times in this and other threads, listening to the difference gives you absolutely no insight into the subjective performance of the lossy codec. Only an ABX test can do that.

I never disputed this claim. What i defend is the fact that subtracting MP3 from WAV objectively reveals encoding noise (artifacts).

Does subtracting MP3 from WAV reveal artifacts?

Reply #34 – 2010-11-07 20:32:24

The doesn't change the fact that the person who compiled the youtube video is completely clueless about how to assess the sound quality of a lossy encoding.

No one here is disputing that subtracting one file from another can leave something audible. The point is that what is audible has no meaning if it is masked by the sound that has been omitted though the unnatural manipulation of subtracting out the original lossless signal.

Does subtracting MP3 from WAV reveal artifacts?

Reply #35 – 2010-11-07 20:45:27

May I assure you that there no practical way to encode an mp3 file better than with a standard phase-preserving quantization of coefficients of a phase-preserving filter bank.

Ok, here's what I did. I took first 4096 samples from "testcase.wav" included with LAME distribution, encoded them with lame -b 128, decoded with lame --decode and did FFT on both original and decoded. The phases of resulting matrices are far from identical. So the phase spectrum is not preserved, or am I missing something?

Does subtracting MP3 from WAV reveal artifacts?

Reply #36 – 2010-11-07 20:48:56

Actually, it is a perfectly logical statement. If the goal of the MP3 encoder is to mask the sounds that you (hopefully) would never hear anyway, it is logical to assume that pin pointing those sounds in an ABX test would be nearly impossible.

No, your original statement is illogical. You said:

Quote

when I heard the difference file in the youtube link it sounds a lot like the sounds that I occasionally notice in FLAC files

If you noticed those sounds in the FLAC file but not in the MP3 file then the codec would "fail" an ABX test. If it "passes" the ABX test (i.e. you can't detect the difference in a double blind test") then by definition you CAN'T hear those sounds in the FLAC file. Hence your statement is illogical (unless you are deliberately using a lossy codec that creates audible artifacts).

Not being able to pin point something does NOT mean it doesn't exist.

In a perceptual listening test it means almost exactly that. If you can't hear the difference, it effectively doesn't exist. The fact that it exists mathematically (when you subtract A from B) is a red herring.

Quote from: alexeysp on 2010-11-07 20:45:27

my argument is that the mathematical difference between a WAV and MP3 file of the same song can help to reveal the differences between lossy and lossless. I am not arguing that either one is better.

It can reveal the mathematical differences, but not the audible differences. You are completely ignoring the fact that the noise that is removed is removed precisely because it is inaudible when played in the presence of the rest of the reconstructed signal. If it were audible as part of the original signal then the codec would "fail" an ABX test.

Does subtracting MP3 from WAV reveal artifacts?

Reply #37 – 2010-11-07 20:52:19

Perhaps I need to go back and re-read, but I don't believe anyone is disputing that neither amplitude nor phase in the frequency domain are perfectly preserved during lossy encoding. I just think DVDdoug's overly-simplistic* comment was merely off-the-cuff. I'm hoping someone with greater expertise will chime-in, though I doubt it since this thread was washed-up before it even began.

(*) to the point that it does more harm than good.

Does subtracting MP3 from WAV reveal artifacts?

Reply #38 – 2010-11-07 21:21:44

I took first 4096 samples from "testcase.wav" included with LAME distribution, encoded them with lame -b 128, decoded with lame --decode and did FFT on both original and decoded. The phases of resulting matrices are far from identical. So the phase spectrum is not preserved, or am I missing something?

1. Check if the signals are aligned in time.
2. Phase spectrum of FFT is not the same as "phase information". You have to understand what you are looking for and take care of several things: weighting windows; analysis of spectrum sparsity; only paying attention to phase of significant spectrum components; etc.

Does subtracting MP3 from WAV reveal artifacts?

Reply #39 – 2010-11-07 21:30:21

Quote from: Alexey Lukin on 2010-11-07 20:18:43

Quote from: pdq on 2010-11-07 19:10:30
So you are saying that the mp3 standard disallows the use of parametric encoding?

Yes, to the best of my knowledge.

I thought that mp3's intensity stereo was a form of parametric encoding?

Does subtracting MP3 from WAV reveal artifacts?

Reply #40 – 2010-11-07 21:40:15

Quote from: pdq on 2010-11-07 21:30:21

I thought that mp3's intensity stereo was a form of parametric encoding?

Oh, yes, you are right! I don't think that it's a widely used feature though: Lame doesn't have it and FhG only uses it for very low bit rates.

Does subtracting MP3 from WAV reveal artifacts?

Reply #41 – 2010-11-07 23:50:58

Quote from: Ouroboros on 2010-11-07 20:48:56

Quote from: HibyPrime on 2010-11-07 20:16:49
Actually, it is a perfectly logical statement. If the goal of the MP3 encoder is to mask the sounds that you (hopefully) would never hear anyway, it is logical to assume that pin pointing those sounds in an ABX test would be nearly impossible.

No, your original statement is illogical. You said:
Quote
when I heard the difference file in the youtube link it sounds a lot like the sounds that I occasionally notice in FLAC files

If you noticed those sounds in the FLAC file but not in the MP3 file then the codec would "fail" an ABX test. If it "passes" the ABX test (i.e. you can't detect the difference in a double blind test") then by definition you CAN'T hear those sounds in the FLAC file. Hence your statement is illogical (unless you are deliberately using a lossy codec that creates audible artifacts).

Quote from: HibyPrime on 2010-11-07 20:16:49
Not being able to pin point something does NOT mean it doesn't exist.
In a perceptual listening test it means almost exactly that. If you can't hear the difference, it effectively doesn't exist. The fact that it exists mathematically (when you subtract A from B) is a red herring.

Quote from: HibyPrime on 2010-11-07 20:16:49
my argument is that the mathematical difference between a WAV and MP3 file of the same song can help to reveal the differences between lossy and lossless. I am not arguing that either one is better.
It can reveal the mathematical differences, but not the audible differences. You are completely ignoring the fact that the noise that is removed is removed precisely because it is inaudible when played in the presence of the rest of the reconstructed signal. If it were audible as part of the original signal then the codec would "fail" an ABX test.

I think you are misunderstanding what I mean by small differences here. I'm not talking about a difference where you can just simply skip back 20 seconds in the song and hear it all over again. I'm talking about the kind of differences that are so small you can't be sure it's actually there.

If we defined reality by what can be easily understood, seen and measured, science would grind to a hault and very little progress would be made. Just because it is too small to be easily measured, does not mean it can't be heard. If a computer can 'hear' it with no problems, why is it so hard for you to accept that a person, on some level, can?

The sub-conscious's latent inhibition will disregard small stimuli at a given time, and at another time for no readily-apparent reason it will present them to your conscious mind. This is exactly the type of case where these tiny differences will always be hard to measure. They are designed to be hard to measure.

Also, you keep saying that the things that are removed are removed because they are inaudible. That is, again, the goal of applied psychoacoustics and can be proven that it has not been met millions of times over. As proof to that statement, I think it would be safe to say that anyone in the world with average hearing (and say, 5 years old or older) can hear the difference between a 32kBps MP3 and a 16/44.1 lossless file. To say that the information removed is inaudible, I assume you were trying to imply that it occurs at higher bit-rates. At what rate does this occur?

Anyway, this is kind of off topic anyways. I originally posted here (and signed up for the sole purpose of) because I was trying to find out if the mathematical difference was actually what was removed from the WAV and not a result of timing differences. We all got the same answer, which is: yes and no.

Does subtracting MP3 from WAV reveal artifacts?

Reply #42 – 2010-11-08 00:14:52

my argument is that the mathematical difference between a WAV and MP3 file of the same song can help to reveal the differences between lossy and lossless.

Are you sure about this? Has it helped you? If so, then can you provide a sample as an example to help those who have trouble ABXing lossy from lossless?

I'm not particularly interested in -V5 since I can often ABX -V5 from lossless without much difficulty. Perhaps you can help me do this with -V3 or -V2.

Does subtracting MP3 from WAV reveal artifacts?

Reply #43 – 2010-11-08 00:34:26

I think you are misunderstanding what I mean by small differences here. I'm not talking about a difference where you can just simply skip back 20 seconds in the song and hear it all over again. I'm talking about the kind of differences that are so small you can't be sure it's actually there.

Either they are inaudible or they aren't. If you can't ABX the original from the lossy file then the differences are inaudible in the context of the original file, nomatter what they sound like or look like when you examine the differences on their own.

If we defined reality by what can be easily understood, seen and measured, science would grind to a hault and very little progress would be made. Just because it is too small to be easily measured, does not mean it can't be heard. If a computer can 'hear' it with no problems, why is it so hard for you to accept that a person, on some level, can?

Computers don't hear. People hear. The purpose of lossy compression is to exploit features in the human auditory process, not to fool computers.

Also, you keep saying that the things that are removed are removed because they are inaudible. That is, again, the goal of applied psychoacoustics and can be proven that it has not been met millions of times over. As proof to that statement, I think it would be safe to say that anyone in the world with average hearing (and say, 5 years old or older) can hear the difference between a 32kBps MP3 and a 16/44.1 lossless file. To say that the information removed is inaudible, I assume you were trying to imply that it occurs at higher bit-rates.

Many thousands of properly conducted ABX tests prove that the bold statement is completely wrong with modern codecs configured properly, where people have been unable to tell the difference between the original WAV and the compressed MP3. The fact that old codecs, or very low bitrates, can be distinguished from the original, is just that - a fact. It contributes nothing to this argument, where the person who posted on Youtube was arguing that the differences he measured proved that even a well tuned MP3 encoder could be distinguished from the WAV, and his method was to measure the mathematical differences, not to do a blind listening test.