Why is MPC perceived to be the best?, (an off-topic audio encoding discussion)
This is a question that has floated through my mind for most of a year, but only yesterday became more clear to me.

Actually, it's a two-part question...

... 1. Is MPC commonly accepted among the HA community as the best psychoacoustic encoding format? (i.e., the most efficient at achieving perceptual transparency.)

... 2. If so, why?

The first item I've heard stated quite frequently, but have never seen any results of "transparency threshold tests" that would reveal the superior efficiency of MPC. I've heard that MPC uses superior encoding technology, but I'm referring more to the end result of such development efforts...the perceived sound quality, as measured against other codecs at the point of perceptual transparency for a significant number of people.

These concerns on my part were born from a post I made here, where the points of MPC statistically tying other formats at 128kbps, but no other known test results existing, were brought up. The thread portion ended up in the recycle bin, but I'm taking the chance that my concerns about calling MPC "the best" weren't the reason it was put there.

Hence, I want to bring up this idea in a different context in the off-topic forum (in the hope that this will be the correct area for it).

What I'd like to see, for instance, for the education of myself and others, would be a results summary like the following (though this is a very simplistic example)...

Format............Perceptual Transparency Threshold (nominal bitrate across samples tested)
MPC.................nnn kbps
AAC.................nnn kbps
Vorbis..............nnn kbps

...and so forth

Granted, VBR is more efficient at mid-bitrates and up, and quality-based VBR modes aren't bitrate centric, but we need some means of measurement and comparison between codecs in this context, so if not calling it "nominal bitrate", then perhaps "average filesize per minute of audio across all samples"

Perceptual Transparency Threshold could have a fixed target, like >90% samples with 5.0 subjective ratings, and non-differentiable from reference with ABX testing.

This kind of test has been discussed before, and has been mostly viewed as having little "real-world value". And I agree. Roberto's tests are much more relevant for most music listeners, and for determining the best formats for useful purposes like streaming audio, portable players, etc.

Many of us (including me) have trouble testing even these bitrate ranges, so higher ones would be even more tedious, and would answer not as many pressing questions.

My point, though, is how can MPC be called "the best for achieving transparency" without a test such as this? (Because so far it's been shown to be only "among the best" at lower rates.)
QUOTE (Kalamity @ Feb 9 2004, 03:00 AM)
Some of these codecs have an 'advertised' setting that is supposed to be transparent, coincidentally averaging around 160-200kbps. Perhaps holding them to this would be appropriate here?

True, but I would not base this test on how the codecs are marketed. They can call whatever they like "transparent" or "CD quality". But ABX results don't lie. wink.gif

QUOTE (Kalamity @ Feb 9 2004, 03:00 AM)
A pass or failure would determine an appropriate direction (lower or higher) for a second test to determine operational tolerance.

That's exactly what I had in mind.

QUOTE (2Bdecided @ Feb 9 2004, 6:51 AM)
Just how many people are going to give you anything except 5.0 for all samples?

(I can think of some - but if you slashdot it to get a large number of listeners, I bet the percentage is low!)

There's also the possibility that people who can hear problems with various samples at the settings you suggest have already reported it here. Maybe you could somehow analyse this data?

Some people will have already made themselves more sensitive to one codecs artefacts than another. This would likely bias your test.

A rating scale wouldn't even be used for this kind of test. Only ABX. If the tester can get p<0.05, then they "move up" to the next higher encoding rate for the format. If they can't, then their transparency threshold for that sample encoded in that format lies below this rate and above the last one they could differentiate.

And "artifact familiarity" won't have a statistical impact if there are enough testers. Some people would be "attuned" to a format's particular artifacts, but many others won't be.

QUOTE (sthayashi @ Feb 9 2004, 11:47 AM)
There are two additional codecs that ought to be tested, WavPack Hybrid and OptimFrog DualStream. These are codecs that have never been formally tested in lossy modes, and Somebody should do it™

I agree that they should be tested at some point against the ones pointed out previously in this thread. But it should really be done a future test, because a) there will be enough test groups as it is with the formats discussed, and b) I want to first pare down these five most commonly used formats, before tackling others.

QUOTE (music_man_mpc @ Feb 9 2004, 3:30 PM)
We should start making some preparations for this test right away. It will be exceedingly time consuming to come up with settings for all these different encoders that all have the same nominal bitrates. I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5.

I agree. And the more I think about it and try to "envision" what the test would be like, I'm thinking we should have one test run for each format, close enough together to minimize unfairness by "version variance" between encoders.

And I don't want to only have 3 rates, as I previously stated. It wouldn't be enough. "Vorbis -q 4 isn't transparent to me, but -q 5 is." OK, so transparency for this tester on this sample in this format has been narrowed to within 32-52kbps of the "line". Not accurate enough. I want the scale to be as granular as possible.

As you point out in your example, I'd like to know a sample's transparency to a particular person with a format to within 10kbps or so.

QUOTE (MGuti @ Feb 9 2004, 4:00 PM)
i recomend splitting the test up by encoder. if its a strictly ABX test, then it woun't be quite as time consuming, IMO, as normal test. you can either tell or you can't.

My thoughts exactly. I'm hoping it'll make the whole thing more manageable in "smaller chunks". But it would have to be almost a "marathon" of tests. If we wait a month between testing each format, then too many people would say "Yeah, but you tested the old MPC against the new Vorbis v1.3", etc. We could, if possible, prepare for testing all the formats at once (over, say, 6-8 weeks), then we could fire off one test, 11 days, then 3 days to compile and publish results, then fire off the next test, 11 days, 3 days to compile/publish, ...and so forth. Prep time in between tests would be minimal if we were set up at the beginning as much as possible. The whole thing, with 5 formats, would take about 10 weeks.

QUOTE (ChristianHJW @ Feb 9 2004, 4:03 PM)
To make this test sensible, you have to remove the 'noise', i.e. the people who dont have the necessary training to differentiate between those codecs.

I recommend to achieve this by either

- doing a pretest, like users have to find out what the 320 kbps MP3 and the original CD is ( quite easy  )

- add the original source ( CD ) to the listening test, and null every vote that ranks the original worse than one of the compressed samples

Pre-tests may be required to determine which particular format variants would be the "most fair to test at mid-high bitrates", but there would be no subjective rating. ABX only. It would not be possible to "rate a reference" in this kind of test. With each encoder setting tested, it's just p<0.05 or p>0.05. The former shows transparency, the latter does not. Maybe we should define a "gray area" of 0.05>p>0.07, perhaps, to show an "exploded view" of the threshold when compiling results. I'm not sure of how much value this would hold, though. It can always be determined at the end, and even shown both ways if preferred.

QUOTE (Continuum @ Feb 9 2004, 4:08 PM)
I think it's still quite safe to assume that MPC is the best encoder for transparent lossy.

It's that word, "assume", that we will be killing with this test. If MPC wins, no need to "assume" any more. wink.gif If not, or if it ties for the top position with other formats, then "assumptions" can summarily be corrected.

QUOTE (Continuum @ Feb 9 2004, 4:08 PM)
IIRC there are some optimizations that kick in at quality level 5 (and are important to quality).

Then, as Tyler says, that is simply the nature of MPC. As in Roberto's tests, we should seek to minimize worrying too much about how formats "scale" their quality settings. If MPC would indeed perform better with a more shallow quality "slope" between q4.1 and q5, then maybe it should be modified to do just that.

This idea is simply to test the best encoder version that each of these formats brings to the table when measured at the threshold of perceptual transparency. And as mentioned before, we could spend the next few weeks pre-testing the different versions of each encoder (especially the ones with newer versions), and picking ideal samples for this kind of test.
