Using insane settings with mp3
Reply #199 – 2005-12-02 18:18:22
Things have changed since I started the thread. Gabriel has started a thread collecting problem samples with heavy artefacts. So Lame development is aware of the specifically problematic character of such samples. Now there is hope much appreciated Lame development takes care of this, and some day we might have a version behaving much better in this respect. As for the technical stuff things have started already as 3.97b2 now uses bit reservoir. My personal ambition at the moment goes towards finding the best Lame version and usage so far as to use cbr320. I will definitely try a listening test and I'm preparing for it right now. I planned to do some more things (continue my bitrate statistics, do some wave form analysis), but I give it up (sorry I was talking about it on HA). I don't think it's valuable any more and will concentrate on the listening test. So I will not contribute to this thread any more (only in case very important items should come up which I don't expect). For the end I'd like to sum up some technical and conceptional things for Lame 3.97b. Maybe the one or other of them is considered helpful for Lame development. a) As I mentioned earlier average audio content within cbr320 frames should be close to 8100 bit (the remaining bits in a frame are used for administrative purposes). This corresponds to an audio stream bitrate of 310 kbps (44.1 kHz sampling frequency). Lame 3.97 wastes some 5% of frame space yielding an effective audio bitrate of some 295 kbps. Average audio bitrate with castanets for instance is 298 kbps. I think this can be improved. Lame 3.90.3 for instance provides an average audio stream of 308 kbps (averaged the average bitrate of 34 samples). b) I think it is wise to restrict size of a frame's audio content the way 3.97b2 or 3.90.3 do with respect to the standard. But I can't see a reason why bit reservoir itself is restricted to 396 Byte the way Lame 3.97b2 or 3.90.3 do it. Using the full 511 Byte can provide for better quality. c) When an encoder uses bit reservoir with cbr320 this effectively means VBR for the audio stream in a specific way. Lame 3.90.3 api uses a rather defensive strategy. Looking at the lowest bitrate of a frame's audio content this averages to 267 kbps over my 34 samples and is always >255 kbps in all of my samples. For difficult frames up to 415 kbps were used. Lame 3.97b2 behaves different. Bitrate of a frame's audio content is rather often below 250 kbps and went down to 208 kbps on my samples. To me this is a bit far from what I'd expect when using cbr320. Bit reservoir is restricted anyway. d) The last point brings me to a conceptual question towards the variable bitrate audio stream (called VBRAS here to differentiate from ABR/VBR/CBR which address transport stream). VBRAS behavior is most defensive with CBR, followed up by ABR; VBR goes last. I can imagine lacking some kind of defensiveness being the reason for faulty decisions in some situations for the encoding machinery. After all the machinery doesn't really know what's good enough. VBR is meant to address quality but I'm not sure whether the idea of a target bitrate doesn't affect this. May be an idea of bringing in a larger security margin within these desicions can help make sure to a greater extent that encoding is good enough even in diffiicult situations. The security margin might scale with the demanded quality up to having (with the hightest quality level) a defensive VBRAS usage similar to the way cbr320 can do it (but not necessarily strict CBR thus giving the possibility of saving bitrate when it can be safely done with a good security margin).Edited for clarifying the safety margin idea: The idea of a safety margin does not address the usual quality measurement techniques. Instead it addresses in a more or less unintelligent brute force way side conditions which are considered potentially relevant for achieving high quality. One of the most attractive and easily implementable targets might be minimal VBRAS bitrate allowed . There could be a restriction on minimal VBRAS bitrate (VBRAS should be in focus, not transport frame bitrate!) which scales with quality demand. Some more relevant factors can come into play, for instance the 'loudness' of a frame should have an influence on minimal VBRAS bitrate. I think a very rough approach for classifying frame's loudness range is appropiate for this. More targets for a scalable saftey margin may be sensitivity for pre-echo detection or detection of other special situations, for instance decision-making of short-block switching or ms-stereo switching. I see concentrating totally on quality this way or another brings some problems towards comparative listening tests when for a fair comparison trying to use identical bit rates for the different candidates. This problem however already exists when using vbr (it might get worse however). IMO this should not be a reason not to improve quality this way. Instead it can be seen as an issue how to perform listening tests in a practical sense. At the moment within listening tests bitrate is the vital element for chosing the way an encoder is used. But there is no reason why quality considerations should not be given the same respect. For instance within a listening test targeting something like 100 kbps encodings why not use Lame 3.97b abr 104 side by side with iTunes CBR96? (makes sense only in case Lame abr 104 is considered to give essantially better results than abr 96). The results concerning quality and bitrate are transparent to the reader, and a potential Lame user may be more interested in the answer whether or not mp3 is more or less competitive within the considered bitrate range even in case bitrate is a bit higher than in the question how Lame behaves at exactly 96 kbps. May be the key to overcome the listening test issue is to talk about a say '100 kbps (or 200 kbps) range listening test' instead of '96 kbps (or 192 kbps) listening test'. Talking about a 96 kbps or 192 kbps test brings a rather technical issue to the focus (the size of a certain transport frame) which is not a good idea anyway. May be this brings relief to for instance considerations whether or not to use something like '-athaa-sensitivity 1' within -V5.