ABX'ed AAC 128 VBR (log posted). Angry :(

Reply #50 – 2012-12-02 02:32:50

Hate to drag out the OT aspect of the discussion, but I saw a prime opportunity to drudge up ol' Jan Sloot's story. According to some discussion on doom9 and elsewhere, reports indicate the quality was poor but the movies recognizable. It was postulated that the system was somewhat similar to MIDI -- huge database of specially selected "generic" content referenced within some sort of compressed text file ("as little as 8kb!"). Most folk (myself included) still think the trick was all smoke and mirrors, but it just goes to show there are plenty of tricks out there due to the inherent subjectivity of quality. Transparency might not be so easy to extend from this because of that pesky "indistinguishable" bit, but the point (shaky as it is) remains -- there may be breakthroughs yet, be they algorithmic, infrastructural, or downright tricks. SBR (and to lesser degree, band folding), PMS, and so forth remain consigned to the category of "tricks" merely because they weren't good enough data models to achieve transparency ("super" SBR for much higher frequencies, carefully used IS, and some other tools and cases notwithstanding). They remain useful for other things (increasing the perceived fidelity of non-transparent music!) but not for achieving transparency at a lower bitrate. It is conceivable that clever new strategies will be devised (and are currently being devised) that will.

But I'm certainly given to agree with C.R.Helmrich that the emergence of breakthrough technologies is becoming increasingly unlikely/difficult given our current understanding of lossy complexity models (RD as a function of ATH and the PA models used to approach these bounds). The seemingly asymptotic pattern of lossy codec improvement over time certainly seems to reinforce this understanding. Most work nowadays is in defining and quantifying this perceptual distortion and addressing it through code -- within and beyond standard limits. It's just that within the standard limits you have to work with a more limited format toolset to make improvements. The benefit of new formats in this area is the ability to leverage our evolving understanding of these models, masks, quantization metrics, etc in terms of addressing shortcomings in our existing models and adding new tricks to the toolset. Unless a viable MDCT alternative emerges, a codec needs to deal with the spread of quant noise over the transform window (detecting these occurrences and upping the bitrate locally is a crude fix for the hated "pre-echo problem." I think Opus can divide a problem block near the boundary, controlling the transient masking window). So we'll probably continue to need tricks to get around the problems introduced by our other tricks. MP3 was beleaguered, among other things, by the trade-offs taken to ensure backwards compatibility with MP1/2.

It's an open problem that there is no 100% accurate psychoaccoustic model in existence. Maybe there will never be, since humans with different ear and brain physiologies are more susceptible to different forms of distortion. This is one reason we rely on larger listening tests.

ABX'ed AAC 128 VBR (log posted). Angry :(

Reply #51 – 2012-12-02 02:42:40

Quote from: jensend on 2012-12-01 03:01:29

Are you really willing to claim that your source model contains all the prior information available when we know that a signal is, say, the PCM from someone's CD collection rather than a stream of entirely random bits? Are you also claiming that you have objectively convincing evidence of this? You might as well be saying "the asymptotic optimality of LZ for Markov sources means that the GZIP'd size of the Library of Congress is so close to its Kolmogorov complexity that no compression algorithm will do significantly better." In both cases our source models are nice and useful but certainly wrong.

The need for audio formats to be vaguely streamable, to require relatively limited amounts of memory for decode, and to decode using some reasonable level of power is a much more relevant limit on compression. You can of course come up with very clever formats that squeeze all kinds of redundancy out of bitstreams, but realistically people aren't going to be interested in using them.

ABX'ed AAC 128 VBR (log posted). Angry :(

Reply #52 – 2012-12-02 02:50:02

Quote from: saratoga on 2012-12-02 02:42:40

Quote from: jensend on 2012-12-01 03:01:29
Are you really willing to claim that your source model contains all the prior information available when we know that a signal is, say, the PCM from someone's CD collection rather than a stream of entirely random bits? Are you also claiming that you have objectively convincing evidence of this? You might as well be saying "the asymptotic optimality of LZ for Markov sources means that the GZIP'd size of the Library of Congress is so close to its Kolmogorov complexity that no compression algorithm will do significantly better." In both cases our source models are nice and useful but certainly wrong.

The need for audio formats to be vaguely streamable, to require relatively limited amounts of memory for decode, and to decode using some reasonable level of power is a much more relevant limit on compression. You can of course come up with very clever formats that squeeze all kinds of redundancy out of bitstreams, but realistically people aren't going to be interested in using them.

That's an excellent point. Vaguely streamable or extremely so in low-delay Opus' case. Some (CM-based?) wide-window redundancy models are certainly off-limits for this reason. Most songs are at their core extremely formulaic -- it is often through a distinctive spread of beat, repeating instruments, and somewhat "context-predictable" aspects that a song is born. Methods to take advantage of these longer-term redundancies would most likely kill streaming potential (solid archive anyone?) or lack speed and/or fidelity at today's computing power requirements.

Notice