I've had a few thoughts based on my (limited) understanding of what decisions made about MPEG-I layer 3 (.mp3) have hampered the quest for efficient, transparent VBR encoding. Perhaps there are some ways to edge it nearer to the efficiency of the MusePack (.MPC) format without departing from the .MP3 standard, which remains well-supported in software and hardware players of all sorts.
Currently lame APS is about 190-200 kbps typically, while mppenc --quality 5 --xlevel (or --standard) seems to be around 155-165 kbps typically (about 18% fewer bits for .MPC)
I don't know enough of the details to shoot these ideas down myself, so perhaps you more knowledgeable people could do so! I'm only thinking out loud, really, and don't need to be treated with kid gloves if I'm being daft or showing my ignorance!
1. sfb21. As I understand it, there's no gain factor for the top spectral band, which causes it to require more bits to encode than it should, and for perceptual transparency (preserving clarity, sheen and sparkle), the frequencies in the top band, sfb21, are still important.
2. Time resolution. The time resolution is limited by the block size, and for replication of transients, avoiding pre-echo, and so on, the shorter the time resolution the better. The best that MP3 can achieve is short blocks (576 samples), which at 44.1 kHz sampling rate are 13 milliseconds long.
So, I'm wondering, ignoring encoding speed, would there be a possible advantage (in a well-tuned codec) to upsampling CDs from 44.1 kHz to 48 kHz before encoding? I'm guessing that LAME would need some attention to really get optimally tuned at 48 kHz (or kSa/s).
I'd presume that the sfb21 band at 48 kSa/s would start at a frequency about 9% higher, and closer to the lowpass (which could remain at 19 kHz as with --alt-preset standard, which is becoming known as --preset standard in later LAME codecs, which is a lower frequency on a normalised basis of inverse bit-period) thus reducing the proportion of audible information that has to be encoded in this inefficient band.
Is this a large enough advantage to be worthwhile?
Also, the time resolution of standard blocks (1152 samples) would improve from 26 ms to 24 ms and short blocks would shorten from 13 ms to 12 ms.
Although only a slight improvement, this might allow marginally fewer short blocks to be used, aiding efficiency.
For the resampling, I'd guess that the fastest method with perceptually inaudible frequency-domain ripple could be used. It might even be possible to estimate whether the expected ripple is masked in each frame/block/granule, and only when it isn't, to recompute using a slower resampling method (like fully bandwidth-limited). I'd imagine that a discrete transform isn't amenable to effectively resampling via the frequency domain as a full Fourier Transform could be, but if it were, it might introduce the least computational overhead to the change of bitrate.
Of course side effects could be many, including a lack of on-the-fly decoding support for most CDaudio burning software, and possibly a lack of support for some hardware players that might only handle 44.1 kHz.
Maybe even the greater number of frames/granules per second would actually waste as many bits as are saved in sfb21 etc. (especially if short block switching doesn't give any advantage).
Anyway, I'd appreciate comments.
I presume that trying it out from a resampled .WAV would be pointless, assuming that --alt-preset standard is only truly optimally tuned for 44.1 kHz.
Regards,
Dick Darlington