Thanks a lot, the links provided by Zoom and SebastianG are great.
The last frame is padded and the exact position of the stream end is stored in the metadata, as far as I get it from chapter A of the specification:
QUOTE
A granule position on the final page in a stream that indicates less audio data than the final packet would normally return is used to end the stream on other than even frame boundaries. The difference between the actual available data returned and the declared amount indicates how many trailing samples to discard from the decoding process.
And if I understand this paragraph correctly...
QUOTE
Ideally, vorbisfile internally reads an extra frame of audio from the old stream/position to perform lapping into the new stream/position. However, automagic crosslapping works properly even if the old stream/position is at EOF. In this case, the synthetic post-extrapolation generated by the encoder to pad out the last block with appropriate data (and avoid encoding a stairstep, which is inefficient) is used for crosslapping purposes. Although this is synthetic data, the result is still usually completely unnoticable even in careful listening (and always preferable to a click or pop).
...then the Vorbis encoder doesn't pad the last frame with zeroes, but with some data derived from the beginning of that frame, the data it is actually meant to encode. This allows for a good compression of the last frame and that extra "imaginary" data doesn't matter, because it's usually cut off by the decoder. However, when crossfading two streams, this extra data can actually be used to crossfade with the new stream, efficiently eating up any clicks that might occur by simply concatenating two decoded frames.
If I understand that completely wrong, someone start barking right now, please. :-)
On the other hand this means that gapless support is no more native in Ogg Vorbis than it is in MP3. The granule position of the stream end is stored in the metadata, just like the lame tag. Besides that crossfading thing with the extra data introduced by the encoder, no special measures are taken to avoid clicks between two different streams. So one can actually hope that a stream doesn't end exactly on a frame boundary, for this will pretty sure introduce clicks because of the missing "crossfade reserve data".
After all, gapless playback is not a native feature of the Vorbis format, but depends entirely on the container (Ogg) and functionality in the decoder (like that crossfading in the vorbisfile library).
Again, if I got this totally wrong, start barking now, please. :-)
@Gabriel:
Can you tell me how lame is encoding the last frame of a stream? Does it just pad it with zeroes or does it fill in some "encoder friendly" data before encoding the whole frame like the Vorbis encoder seems to do it?
If the lame encoder doesn't, and we're encoding at a high bitrate, there won't be any crossfading reserve in the decoded last frame, I assume. So gapless MP3 would entirely rely on the hope that the last sample of the first stream matches the first sample of the second stream to avoid clicking, is that correct?