Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: fb2k's compensation for decoder delay is inconsistent (Read 5950 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

fb2k's compensation for decoder delay is inconsistent

As I understand it, the first non-VBR-header frame of an MP3 file should decode to 529 (the usual decoder delay) + 1152 samples, and every frame thereafter should decode to exactly 1152 samples. The player should discard those first 529, and then it should discard however many samples the VBR header said was the encoder delay. When the last frames are reached, the player should discard however many samples the VBR header said was padding. It should be as simple as that, right?

foobar2000 1.1 seems to behave inconsistently in this regard. timcupery apparently noticed the problem in 0.9 a while back but didn't get any responses, perhaps because he asked about several things at the same time.

Here's a quick test to demonstrate.

flac.exe -c -d -s Boing_Boom_Tschak.flac | lame.exe -V0 -Boing_Boom_Tschak.mp3
The resulting MP3 has 1155 frames + the VBR header frame w/LAME tag. The LAME tag specifies encoder delay 576 and padding 991. This checks out fine: 1155 frames × 1152 samples/frame = 1330560 samples. 1330560 - 576 - 991 = 1328993, same as the original FLAC.

Now load up the MP3 in foobar2000 1.1 and convert it to WAV. Output should be 1328993 samples and should align with the FLAC in a wave editor. No problem so far.

Now in foobar2000, edit the MP3's gapless playback information. Change encoder delay to 0 and length-of-track to 1330560. The playlist item's properties window should now show delay & padding of 0. Convert to WAV again. We would expect the output to be 1330560 samples. However, it's now 1330031. In a wave editor, it's apparent that fb2k removed 529 samples from the beginning, apparently overcompensating for decoder delay by a factor of 2!

It's not just happening in conversion, but in ordinary playback as well. Any MP3 having a delay and padding both set to 0 loses its first 529 samples. Because of this, seams are audible in what should be a gapless series of MP3s created by mp3DirectCut after splitting a CBR, no-bit-reservoir MP3 on frame boundaries.

Back to the Kraftwerk sample—some experimentation shows that after using fb2k to edit the gapless playback information, leaving the encoder delay at 0 and setting the padding to any value between 0 and 529 (length-of-track 1330560 to 1330031), results in the same 1330031-sample output. But setting the padding to 530 (length-of-track 1330030) results in the requested 1330030 samples. In a wave editor, one can observe 576 samples of encoder delay, followed by 1328993 samples of audio, followed by not 530 but 461 samples of padding, as that's what's required to reach 1330030. I notice 1330030 is 530 shy of a frame boundary, probably not a coincidence.

I haven't even experimented with encoder delay values yet, and I'm finding it rather painful to wrap my head around this behavior. I don't understand why decoder delay even enters into the equation. I should be able to set encoder delay & padding to whatever I want and get those samples trimmed from the beginning and end, respectively, and never be exposed to decoder delay. What's going on here? Bug? Feature?

fb2k's compensation for decoder delay is inconsistent

Reply #1
Where do you arrive at the conclusion that 529 is "overcompensating for decoder delay by a factor of 2?" 529 is not an even number. It is exactly the decoder delay for layer 3 files, in this case.

fb2k's compensation for decoder delay is inconsistent

Reply #2
Where do you arrive at the conclusion that 529 is "overcompensating for decoder delay by a factor of 2?" 529 is not an even number. It is exactly the decoder delay for layer 3 files, in this case.


The decoder presumably added 529 samples of silence to the beginning. These got removed, as they should be. But then another 529 samples got removed. Those samples are getting cut from the beginning of the non-delay part of the audio.

fb2k's compensation for decoder delay is inconsistent

Reply #3
If you remove the gapless information or set the delay and padding to zero, then the decoder delay is not removed.

fb2k's compensation for decoder delay is inconsistent

Reply #4
If you remove the gapless information or set the delay and padding to zero, then the decoder delay is not removed.


If that's true, then there should be 529 extra samples of silence at the beginning. There's not.
The decoder delay is stripped, and another 529 samples beyond that are also stripped.

fb2k's compensation for decoder delay is inconsistent

Reply #5
If that were the case, then the decoder would output ( number of frames * 1152 ) + 529 samples. It does not. The only way to make it do so would be to feed it a garbage frame after the end of the file, then keep the first 529 samples of that frame.

The decoder always emits exactly 1152 samples per frame decoded.

Perhaps somebody here should better explain how MP3 gapless information works, where the delay samples are emitted, and how they are discarded?

How does LAME itself handle this when decoding files with or without gapless information?

fb2k's compensation for decoder delay is inconsistent

Reply #6
The decoder always emits exactly 1152 samples per frame decoded.


This doesn't help me. Decoder delay, as I understand it, precedes the first frame's-worth of 1152 samples. Not every frame, just the first one in the stream. I assume fb2k, as the handler of the encoder's i/o, removes that delay so the first frame effectively yields 1152, like the rest. Are you saying that's not true?

The fact remains that:
  • given an 1155-frame file with 576 delay and 991 padding, fb2k emits 1328993 (1155×1152-576-991) samples, which makes sense, and inspection of the samples reveals nothing is missing; the sample stream matches up with the original.
  • given an 1155-frame file with 0 delay and 0 padding, fb2k should emit 1330560 (1155×1152) samples, but instead it emits 1330031 (1155×1152-529) samples, which doesn't make sense. Inspection reveals the missing 529 samples are in fact signal, not delay, as I already stated. Follow the steps above to reproduce it if you don't believe me.


So it seems under some low encoder delay/padding circumstances, 529 samples get cut in error.

fb2k's compensation for decoder delay is inconsistent

Reply #7
As I understand it, the first non-VBR-header frame of an MP3 file should decode to 529 (the usual decoder delay) + 1152 samples, and every frame thereafter should decode to exactly 1152 samples.
That's wrong, all frames of your MP3 file decode to exactly 1152 samples.

The player should discard those first 529, and then it should discard however many samples the VBR header said was the encoder delay.
True. First its 529 samples are discarded, because that's the delay of the decoder. Additionally, if encoded with a "LAME gapless tag", the specified number of subsequent samples, the encoder delay, are cut off too.

When the last frames are reached, the player should discard however many samples the VBR header said was padding.
Almost right. It is necessary to clarify that there is an implicit padding of 529 samples (the decoder delay again), so padding values <= 529 have no effect and are ignored.

How does LAME itself handle this when decoding files with or without gapless information?
lame-3.98.4.tar.gz:/frontend/main.c:201:
Code: [Select]
            if (*enc_delay > -1 || *enc_padding > -1) {
                if (*enc_delay > -1)
                    skip_start = *enc_delay + 528 + 1;
                if (*enc_padding > -1)
                    skip_end = *enc_padding - (528 + 1);
            }
            else
                skip_start = lame_get_encoder_delay(gfp) + 528 + 1;
Full-quoting makes you scroll past the same junk over and over.

fb2k's compensation for decoder delay is inconsistent

Reply #8
True. First it's 529 samples are discarded, because that's the delay of the decoder. Additionally, if encoded with a "LAME gapless tag", the specified number of subsequent samples, the encoder delay, are cut off too.

I guess I'm not understanding what the decoder delay is, then. I need a diagram showing how the 1152 samples in the first frame are split into encoder delay, decoder delay, padding, and samples representing the samples from the original audio.

Quote from: Yirkha link=msg=0 date=
It is necessary to clarify that there is an implicit padding of 529 samples (the decoder delay again), so padding values <= 529 have no effect and are ignored.

What is the relationship between decoder delay and padding? I thought padding was appended to the original audio (not the original audio + 529) to make it be a multiple of 1152 before encoding, and decoder delay is silence added to the beginning upon decoding.

Bonus if this can be explained without using the words 'filterbank' and 'MDCT'

fb2k's compensation for decoder delay is inconsistent

Reply #9
   [font= "Arial"]Oh well, here you go:[/font]
Full-quoting makes you scroll past the same junk over and over.

fb2k's compensation for decoder delay is inconsistent

Reply #10
Thanks! This is helping. I'm closer to understanding what foobar is doing. The fact that there are 3 related but discrete padding measurements is a revelation. I have more questions, but it'll take me a day or two to articulate them. Thanks again.