Help - Search - Members - Calendar
Full Version: Flawed conclusions on WMA
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
astroboy
Hello forum smile.gif

I have been reading this forum and it is used as a reference by people all over the internet to come to conclusions about which audio codec to use. Many of the times, the decision of the ordinary person comes down to WMA vs MP3 as that is where the majority of hardware support lies.

In giving their advice, many people cite http://www.rjamorim.com/test/multiformat128/results.html and conclude that MP3 (encoded with LAME) is superior to WMA in audio quality, and therefore people should use that.

Looking at the test results, I see that other than atrac, the results merely reflect the different bit-rates being played. As in, there is actually no conclusion at all from the results other than that higher bit rates sound better. Furthermore, I cannot seem to find the actual details of encoder settings used, but the top ones, including LAME MP3 appear to be using variable bit-rate. Comparing this to constant bit-rate WMA files is absolutely ridiculous. Considering how close WMA comes to the others, it is almost possible to conclude that WMA is superior to them when factoring in the difference between constant and variable bit-rates.

Of course on top of this is the ability for the test, due to its public nature, to have been tainted by anyone who has an agenda against a particular codec. It also has significant error due to the variability of equipment used by the listeners. Another factor is that people who are used to a particular sound, such as that of iTunes encoded AAC files, may be more likely to rate that above the other codecs. Furthermore, there are a number of other tests conducted by others in both controlled and public environments that appear from my understanding to have come to the opposite conclusion to the anti-WMA sentiment in this forum.

I would be happy for someone to point out where my analysis is wrong, otherwise it would seem to me that when people are criticising WMA they should state this as their personal opinion and stop pointing to the 128 kpbs listening test as some kind of proof that WMA is inferior to even MP3. I 'thought' the goal of many people in this forum was to enable people to make informed decisions about various codecs, but from my reading of this forum, I do not get that impression.
kanak
Um, no.

From (http://www.rjamorim.com/test/multiformat128/presentation.html)

QUOTE

The encoders and parameters tested are:

* LAME encoder 3.96 -V5 --athaa-sensitivity 1
* Apple iTunes 4.2 128kbps AAC
* Ogg Vorbis aoTuV tuning b2 -q 4.35
* Musepack 1.14b --quality 4.15 --xlevel
* Sony Atrac3132kbps
* Microsoft WMA9 Std Bitrate VBR 128kbps

Clearly, VBR is used for wma.

Also if you look at the end of the link you provided, there is the ANOVA analysis of the results
QUOTE

Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place.


Also, since ABC/HR was used to conduct the test, there is no question of bias or pushing agendas.
astroboy
Why isn't the differing bit-rate the reason for the differing scores?
Lyx
This has been discussed way too many times. Look up the test-thread and read it. You will find an answer to your question.

Short version: the varying bitrates are perfectly valid. Thats the whole point of VARIABLE BIT-RATE. A good VBR-encoder should raise the bitrate on difficult samples. If it doesnt, the encoder is actually worse. The encodersettings used all result in a similiar bitrate AVERAGE across a multi-genre large music collection. The overall bitrate is NOT higher or lower for any codec. It is just that some encoders were clever enough to raise the bitrate for the TEST-SAMPLES while others were too stupid to do that. Thus, the perfect (utopic) encoder would result in an average bitrate across a large music collection, of 128kbit, yet would recognize the difficulty of the test-samples, pump up the bitrate to something like 200kbit, and achieve a perfect 5.0 score.
astroboy
I do not agree with you at all, otherwise there would be no settings for target bit-rate when doing VBR encoding. There is also no way that a codec developer could know what the average music collection will look like now or in the future. Therefore they cannot determine the complexity of music that will be encoded. A better codec will make sacrifices to achieve the desired bit-rate, otherwise it may as well just go off on a tangent and output any bit-rate it likes.
Lyx
Are you trolling?

QUOTE(astroboy @ Apr 17 2007, 07:11) *

I do not agree with you at all, otherwise there would be no settings for target bit-rate when doing VBR encoding.

WTF? What do the available settings have to do with the validity and logic of the test?

QUOTE
There is also no way that a codec developer could know what the average music collection will look like now or in the future.

What does the codec developer have to do with that? Determining an similiar average bitrate for the test is a job for the test-organizer, and IT WAS DONE.

QUOTE
A better codec will make sacrifices to achieve the desired bit-rate, otherwise it may as well just go off on a tangent and output any bit-rate it likes.

Boy, get a clue about what VARIABLE BITRATE is about, and why the basic VBR-setting is not a "bitrate-target" but instead a "QUALITY-target". VBR does not target any bitrate at all. Thats what ABR is for (for some reason meaning "average bitrate"). After you have educated yourself about the basics of how lossy codecs work, you can write statements about it.

It is kind of weird that you claim to be a long-time forum reader, yet you act as if you do not know those basic things about how lossy codecs work.


- Lyx
Sebastian Mares
There is a difference between ABR and VBR. When using VBR WMA with a target bitrate, you are in fact using ABR. A true VBR codec will not look to maintain a bitrate close to the one specified, but will try to maintain quality as high as possible. The problem with WMA is that its VBR engine, when used, does not come close to the bitrate of the tests - it's either way too high or way too low. Instead of using CBR, Roberto decided to use ABR.

In my test, I used CBR for the WMA Professional codec and according to the results at 128 kbps, the Professional codec is quite competitive.

Lyx was faster... >_<
LANjackal
@ astroboy: I'm a WMA user, and I think the test was pretty fair. Simply put, all current generation codecs perform their best at VBR, but they don't all encode to the same bitrate at those settings. As such, it is a bit tough to create an absolutely level playing field, but the testers did their best.
astroboy
So even though it says that WMA is being used at VBR 128kpbs on the presentation page for that test, it should actually say WMA ABR?

In which case, MP3 VBR and WMA ABR cannot be compared meaningfully.

Eventually I will just have to test for myself anyway.. I think these public listening tests are meaningless and the only testing that should be encouraged is personal testing. Why would you choose a codec based on a test done by someone else when you can do it yourself? I need to decide between WMA and MP3 because my car doesn't play anything else and I like both formats over the competitors.
ShowsOn
QUOTE(astroboy @ Apr 17 2007, 15:08) *

In which case, MP3 VBR and WMA ABR cannot be compared meaningfully.

Why not? MS is constantly claiming that WMA at 64 Kbps is better than MP3 at 128 Kbps. Here is a controlled test that shows at around the same bitrate, LAME MP3 is better. Surely the conclusions of this controlled test are more valid than uncontrolled tests / marketing spin.
QUOTE(astroboy @ Apr 17 2007, 15:08) *

Eventually I will just have to test for myself anyway.. I think these public listening tests are meaningless and the only testing that should be encouraged is personal testing. Why would you choose a codec based on a test done by someone else when you can do it yourself? I need to decide between WMA and MP3 because my car doesn't play anything else and I like both formats over the competitors.

Where on this forum does it say that public listening tests are the only form of evidence one can use to determine codec use?

These public tests are simply examples of controlled testing that are a good guide as to how specific encoders perform. They are a lot better than uncontrolled tests based on subjective opinion, that are completely the result of placebo.
halb27
QUOTE(astroboy @ Apr 17 2007, 08:08) *

... I think these public listening tests are meaningless and the only testing that should be encouraged is personal testing. ...


No, public listening tests are not meaningless, but they have a necessarily restricted relevance. Most crucial to me is the fact that an encoder can have a flaw which shows up in not only extremely rare situations but which isn't reflected in the test samples. (But if there's conciousness about the flaw it's also a problem how this flaw should be adequately weighted within the samples).
So this is all approximation to the truth, more exactly: valuable approximation to truth.

IMO the best way to deal with the outcome of listening tests is not to be nitpicking with the resulting numbers. As for that looking at the test you mentioned to me WMA is absolutely on par with Lame. But it also shows as was mentioned before that Microsoft's claim of WMA (standard)'s superiority is ridiculous. May be this is the background for the anti-WMA attitude in this forum.

Thinking practically you can use mp3 with your car hifi like you use WMA standard and thus use the most compatible format (now and in the near future).

As for your VBR remarks: I agree with you that small differences in test outcome may be due to corresponding small bitrate differences that come out with the tested samples. This is not in contradiction with the fact that VBR mode is chosen carefully by the test organizers. It's a general fair comparison problem with VBR mode. But there is also a more general problem of which encoder setting to use. It's not always a matter of fact that the encoder setting used is really the optimal setting. So there's quite some uncertainty in this area, and the best thing to deal with it is again not to be nitpicking with the results.

And yes, I'd say public listening tests point into the right direction, but if you want to make sure what this exactly means to you you should do listening tests on your own.
But not everybody wants to do that and there's even a big danger doing so: you will become oversensitive to problems. As I know from experience this doesn't make life easier.
Lyx
QUOTE(halb27 @ Apr 17 2007, 09:58) *

It's a general fair comparison problem with VBR mode. But there is also a more general problem of which encoder setting to use.

It is not a fair comparision problem at all, because of the scenario to which we want to extrapolate such tests. Or in other words: the reason why we actually test. We do not test because of killer samples. We dont even test because of small samples in general. Who cares about that? What we use lossy codecs for is listening to music. We want to extrapolate the testresults (after accounting for the increased difficulty of the test-samples) to real-world scenarios. In the real world, users dont encode small killer-samples for listening. In the real world, users encode entire music collections. The target for extrapolation is entire music collections - or in other words: music overally. Our hypothetical target-scenario looks like this:

- we have a 100gb hard-drive and N music albums.
- we want to encode them all to a lossy format, filling up all the available space
- now we want to know which codec will give us the most bang for the buck, quality/space wise

Even though, this scenario is a bit contrieved, it is valid, because it is the only reasonable (and thus fair) way to test lossy VBR-codecs. VBR codecs dont target a specific bitrate. They target a certain quality. In an utopian world, tests would work the other way around: we would already have the perceptual quality of each codec and would just have to do the maths to see which codec can achieve a given quality over an entire collection, with the least diskspace used. Unfortunatelly, the real world works a bit different. However, it is also clear, that VBR is a very efficient and reasonable concept, and that it is therefore illogical and stupid to make them all use the same average bitrate for the testsamples - that would beat the purpose of VBR - part of what makes a VBR codec better than another VBR codec, is that it should recognize difficult to encode parts, and increase the bitrate to maintain a constant quality. So, our only choice left is a sort of compromise: take a music collection as large and varied enough as possible, then make it a testrule that all encoders used, must encode this collection at the same overall bitrate. Even though this is against what VBR is about, it leaves the encoder a lot of freedom - enough freedom, to for example recognize difficult to encode music genres and easy to encode genres. It also leaves them alot of freedom to do what they are best at with the difficult testsamples: adjust their bitrate.

Comparing all testsamples at the same overall bitrate, is what would be really stupid and illogical. First it would fail to test how good a VBR codec is at being VBR. Second the results of the test would be totally meaningless, because it would have no relevance to the real world anymore. This has been discussed to death in the past. Get educated or STFU, damnit.

- Lyx
halb27
QUOTE(Lyx @ Apr 17 2007, 10:35) *

QUOTE(halb27 @ Apr 17 2007, 09:58) *

It's a general fair comparison problem with VBR mode. But there is also a more general problem of which encoder setting to use.

It is not a fair comparision problem at all...

???
As you describe in detail it's the nature of VBR that makes the bitrate used on a specific sample using different VBR encoders not well comparable.
As I said this doesn't invalidate the test at all, and the tests are certainly done with all the skills necessary.
The conclusion should be not to disbeleive in these tests but not to be nitpicking at the resulting numbers. For the test mentioned: WMA standard was on par IMO with Lame (and most of the encoders tested).
I wasn't talking about problem samples, but as you are talking about sample selection: the selection of (normal) samples of course has an influence on test outcome especially if the outcome of some contenters is close. Same goes for the listeners (but is covered by the usual analysis). This all doesn't invalidate the tests at all but we should be aware of the restrictions.

My solution to all this is simply: don't care about small differences in test results (and because of this I very dislike the 'zoomed' view of test results which inadequately blows up small differences). My interpretation of the results of the beforementioned test is: Vorbis and MPC are best, Lame and iTunes and also WMA std. are second, ATRAC is worst. To me the differences of a more detailed differentiation are within the natural volatility of test results - volatility not due to variation in listeners' peceived quality but due to mentioned side conditions of the test itself.
My classification was only for clarification. More interesting are the practical implications: use MPC (if your DAP supports Rockbox) or Vorbis (if your DAP supports it) for the best quality. Don't care using mp3 as it provides for very good quality. Same goes when using iTunes on an iPod. Microsoft's claims regarding WMA std. are BS but on the other hand there's nothing wrong for people who want to use it. Sony's ATRAC3 isn't real bad but not really attractive.
That's how I would interpret things in practice.
Lyx
Maybe i misunderstood you *unsure*. What i meant was that the varying bitrates in the testsamples, between the different encoders, are not a sign of unfairness. More like the opposite. What makes VBR difficult to compare - the main problem - goes into the opposite direction: the music collection with which the encoders get calibrated. You cannot have an infinitely large collection with every music genre in existance in it. However, the larger the collection, the lower this incertainity becomes. And as usual with dimishing returns. At some point - and i think we reached it already - this sideeffect becomes overshadowed by other effects like error margins, etc. - so, in that regard, i agree with you: as with so many other things in life, perfection is impossible here - but "good enough" is good enough.

I also agree that the testresults are often wrongly extrapolated and interpreted. First we have the zoomed images, which subjectively appear to inflate the differences. Then we have to take into account, that those are results for "killer-samples". And lastly, the testing-metology makes the listeners spent way more focus and attention at the music than is usual in the real world (including multiple repeats of the same sample in a short amount of time). Thus, in the real world the encoders would actually perform much better, than in such tests. Listening tests put codecs at way more pressure, than is usual in everyday-listening. The phrase "LAME V5 is transparent to most people" is actually an understatement under normal listening conditions.

- Lyx
halb27
QUOTE(Lyx @ Apr 17 2007, 11:55) *

... so, in that regard, i agree with you: as with so many other things in life, perfection is impossible here - but "good enough" is good enough. ...

I'm glad the day has come that we agree on something.
eofor
QUOTE(ShowsOn @ Apr 17 2007, 07:40) *

Why not? MS is constantly claiming that WMA at 64 Kbps is better than MP3 at 128 Kbps


To be fair, they've stopped claiming that years ago. And while 64k WMA is definitely not better than VBR LAME anno 2007, I suspect stacks up quite well to 128k Blade anno 1995.
Sebastian Mares
Also notice that Microsoft seems to favor CBR and not VBR - at least that is my impression. Even in the HE-AAC vs. WMA Professional 10 test Microsoft recommended NSTL to use CBR.
sld
QUOTE(astroboy @ Apr 17 2007, 11:36) *

Furthermore, there are a number of other tests conducted by others in both controlled and public environments that appear from my understanding to have come to the opposite conclusion to the anti-WMA sentiment in this forum.

Are you referring to the folly of the majority, the folly of obeying placebo in non-double-blind tests, and the existence of "paid-for" audiophile tests?

HA.org is a forum in which 99% of the time I do not have to take conclusions with teaspoonfuls of salt.
ff123
QUOTE(halb27 @ Apr 17 2007, 02:14) *

My solution to all this is simply: don't care about small differences in test results (and because of this I very dislike the 'zoomed' view of test results which inadequately blows up small differences). My interpretation of the results of the beforementioned test is: Vorbis and MPC are best, Lame and iTunes and also WMA std. are second, ATRAC is worst. To me the differences of a more detailed differentiation are within the natural volatility of test results - volatility not due to variation in listeners' peceived quality but due to mentioned side conditions of the test itself.


For the samples tested, the group who listened, and the codecs/settings tested, Roberto's conclusion is correctly worded:

"Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place."

WMA should not be lumped in with the second tier.

ff123
halb27
QUOTE(ff123 @ Apr 17 2007, 16:59) *

... Roberto's conclusion ...
"Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place."
....

I agree grouping results as I did and as Roberto did is a problem as the numerical results represent a floating scale of qualitative results where it's subjective where to draw the exact borders.
I was aware of that when I wrote it, that's why I added the practical implications which are more of concern anyway.

But this is only a side effect of what I wanted to say: it's best not to care about small differences in test results.

Of course if codec A is only insignificantly better than codec B, codec B insignificantly better than codec C, codec C insignificantly better than codec D, and so on up to codec F, this does not mean that codec A is on par with codec F.

Luckily the practical implications usually are not so prone to misunderstanding and subjective judgement, and that's what counts.
ff123
QUOTE(halb27 @ Apr 17 2007, 08:43) *

QUOTE(ff123 @ Apr 17 2007, 16:59) *

... Roberto's conclusion ...
"Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place."
....

I agree grouping results as I did and as Roberto did is a problem as the numerical results represent a floating scale of qualitative results where it's subjective where to draw the exact borders.
I was aware of that when I wrote it, that's why I added the practical implications which are more of concern anyway.

But this is only a side effect of what I wanted to say: it's best not to care about small differences in test results.

Of course if codec A is only insignificantly better than codec B, codec B insignificantly better than codec C, codec C insignificantly better than codec D, and so on up to codec F, this does not mean that codec A is on par with codec F.

Luckily the practical implications usually are not so prone to misunderstanding and subjective judgement, and that's what counts.


Um, then we actually don't agree.

Roberto correctly grouped the results. The vertical bars represent the 95% confidence intervals. So you can say with 95% confidence that Lame mp3/iTunes AAC was rated better than WMA standard. The difference was not "small" but rather "significant" (in the statistical sense).

Of course there is variability in the ratings. However, that variability did not obscure the significance of the results.

ff123

QUOTE(astroboy @ Apr 16 2007, 20:36) *

Of course on top of this is the ability for the test, due to its public nature, to have been tainted by anyone who has an agenda against a particular codec. It also has significant error due to the variability of equipment used by the listeners. Another factor is that people who are used to a particular sound, such as that of iTunes encoded AAC files, may be more likely to rate that above the other codecs. Furthermore, there are a number of other tests conducted by others in both controlled and public environments that appear from my understanding to have come to the opposite conclusion to the anti-WMA sentiment in this forum.


Anybody with an "agenda against a particular codec" would have had to have broken the encryption used by abchr-java. And even if he managed to do so, that would only represent one result.

The variability of equipment is not a bug, it's a feature! The fact that significant results were achieved, in spite of the equipment variability, makes it that much more believable. Plus it's more representative of real life.

Show me the evidence that if I am familiar with a codec's sound, I am likely to rate it higher. If anything, being familiar with its shortcomings, I am likely to rate it lower.

Show me the test results which come to the opposite conclusion of Roberto's test (as regards WMA standard).

ff123
kwanbis
wasnt sebastian mares more recent test?

http://www.listening-tests.info/mf-128-1/results.htm
ff123
QUOTE(kwanbis @ Apr 17 2007, 09:49) *

wasnt sebastian mares more recent test?

http://www.listening-tests.info/mf-128-1/results.htm


Yes, but that tests wma-pro. The test linked in the first message tested wma-std
halb27
QUOTE(ff123 @ Apr 17 2007, 18:24) *

... So you can say with 95% confidence that Lame mp3/iTunes AAC was rated better than WMA standard. The difference was not "small" but rather "significant" (in the statistical sense). ....

This is about the reliability of the listening results due to the listening and judging part of the listeners.
This is not my point.
I'm into side conditions of the test itself: choice of samples which can favor specific codecs to a certain degree especially when VBR is used (it can happen that a codec for instances chooses an unusual high bitrate for a certain sample which in this case favors this codec on this sample), choice of encoder settings which can be disadvantegous for a certain codec (the question VBR or ABR or even CBR for Lame for instance which at least for higher bitrates is not so obvious as many people think. It was quite interesting to see people's reaction on the 64 kbps test where WMA pro in CBR mode came out second place), and so on.
After all there is a certain error margin as Lyx called it in the results which has absolutely nothing to do with the usual statistical analysis which addresses reliabilitiy of the judgements of different listeners.
And that's why IMO it's best to ignore small differences in the outcome of a test.
Forget about the quality grouping which has its own issues.
Think directly about the practical implications from a test. These do not depend on small differences on the quality scale.
greynol
QUOTE(halb27 @ Apr 17 2007, 11:01) *
I'm into side conditions of the test itself: choice of samples which can favor specific codecs to a certain degree especially when VBR is used (it can happen that a codec for instances chooses an unusual high bitrate for a certain sample which in this case favors this codec on this sample), choice of encoder settings which can be disadvantegous for a certain codec (the question VBR or ABR or even CBR for Lame for instance which at least for higher bitrates is not so obvious as many people think...

Oh really? Do you have any evidence to substantiate this outside of the 3 or 4 killer samples that you continuously mention?
halb27
QUOTE(greynol @ Apr 17 2007, 20:08) *

QUOTE(halb27 @ Apr 17 2007, 11:01) *

the question VBR or ABR or even CBR for Lame for instance which at least for higher bitrates is not so obvious as many people think...

Oh really? ....

As you know I formulated this sentence very soft compared to my a lot more dedicated opinion/experience on this theme. I don't want to bring this here. Take it just as a (questionable if you like to) example for different encoder settings which can have an impact on a test's outcome.
rjamorim
QUOTE(greynol @ Apr 17 2007, 15:08) *

QUOTE(halb27 @ Apr 17 2007, 11:01) *
I'm into side conditions of the test itself: choice of samples which can favor specific codecs to a certain degree especially when VBR is used (it can happen that a codec for instances chooses an unusual high bitrate for a certain sample which in this case favors this codec on this sample), choice of encoder settings which can be disadvantegous for a certain codec (the question VBR or ABR or even CBR for Lame for instance which at least for higher bitrates is not so obvious as many people think...

Oh really? Do you have any evidence to substantiate this outside of the 3 or 4 killer samples that you continuously mention?


When choosing samples, I try to address as many musical styles as possible, and that's it. If a sample favours a particular codec or trips it: tough luck!

When choosing encoder settings for LAME, I ask the LAME developers. I expect them to know much more about their encoder than anyone else, including halb27.
halb27
QUOTE(rjamorim @ Apr 17 2007, 21:45) *

When choosing encoder settings for LAME, I ask the LAME developers. ...

Sure that's the right thing to do. But it's not always that easy.

Think of a test with Helix mp3 to participate. There's a problem whether to use the default setting, level's setting, or another setting. And no matter which setting is chosen we will never know without preliminary trials whether or not Helix could have performed better if another setting were used.
Or look at the wma setting used in the test this thread is about. The quality estimation based vbr method was not used because an adequate setting yielding an average bitrate close to 128 kbps was not available. So the bitrate oriented vbr method was used (which in the opinion of many members should have given a penalty to WMA).
But even for Lame it's not that simple for low bitrate tests of say 96 kbps. There is a more widespread thinking that at such a bitrate ABR is to be preferred and not a low quality -V setting. It's an open question.

Taking it altogether there can be a problem in the choice of the encoder setting. From a test organizer's point of view of course you're always right letting the developers decide if available.
From the test result's side however it's best simply to be not nitpicking with the resulting numbers. With a roughly identical outcome in a specific test two encoders should be considered to provide equal quality IMO.
astroboy
QUOTE(ff123 @ Apr 18 2007, 02:24) *

Anybody with an "agenda against a particular codec" would have had to have broken the encryption used by abchr-java. And even if he managed to do so, that would only represent one result.

Any large organisation with an agenda against Microsoft does not have to break encryption, they can break the application/implementation or get the sound output and the original cd and encode and compare. Although I doubt this happened if the N= is the sample size.

QUOTE

The variability of equipment is not a bug, it's a feature! The fact that significant results were achieved, in spite of the equipment variability, makes it that much more believable. Plus it's more representative of real life.

Variability is a bug as if the equipment happens to add artifacts to one sample and not the other or makes both samples sound the same and leads to guessing the better one, than at low sample sizes it is less likely to be a 50/50 distribution.

QUOTE

Show me the evidence that if I am familiar with a codec's sound, I am likely to rate it higher. If anything, being familiar with its shortcomings, I am likely to rate it lower.

I have a Marantz CD-48 cd player. Its old and not exactly perfect, but has a lot of life and I compare everything to that, listening out for the strengths that this player has to see if other players compare, and ignoring the weaknesses.

QUOTE

Show me the test results which come to the opposite conclusion of Roberto's test (as regards WMA standard).

I didn't say the opposite of the test, I said the opposite of the anti-WMA sentiment. There are tests that are slightly different, eg extremetech, download.com, the 192 kpbs being discussed atm... that suggest WMA is quite good.

The bottom line is, if N= the sample size it is too low to even come to any conclusions. On top of this, if there is no original source, people may be just comparing which codec filters out the imperfections that exist in the original rather than which codec includes the most detail from the original. And then there is the difference between WMA with ABR and MP3 with VBR, the different bit rates, and so on which are amplified when using small hard-to-encode sections. What are the maximum bit-rates of the sample files anyway? Why isn't that detail included in the test results page? And is it just a coincidence that the winners in each sample appear mainly just to be the ones with the highest average bit-rate? Of course not..

Given all of this, the test is neither scientifically valid, nor can it be relied on to form any conclusions at all. Despite this it is used constantly to give evidence towards MP3 being better than WMA.
rjamorim
QUOTE(astroboy @ Apr 18 2007, 08:17) *
Any large organisation with an agenda against Microsoft does not have to break encryption, they can break the application/implementation or get the sound output and the original cd and encode and compare. Although I doubt this happened if the N= is the sample size.


You really never checked how ABC/HR Java works, right?

QUOTE
Variability is a bug as if the equipment happens to add artifacts to one sample and not the other or makes both samples sound the same and leads to guessing the better one, than at low sample sizes it is less likely to be a 50/50 distribution.


ff123's point - that you missed completely - is that a test with subjects using only Sennheiser cans and E-Mu sound cards would only be meaningful to people that listen to music on Sennheiser cans and E-Mu sound cards. That would particularly piss me off since I have a Sound Blaster Live! and crapola Philips cans.

QUOTE
I have a Marantz CD-48 cd player. Its old and not exactly perfect, but has a lot of life and I compare everything to that, listening out for the strengths that this player has to see if other players compare, and ignoring the weaknesses.


That is not evidence, that just shows you are delusional and biased about the Marantz sound quality.

I agree with ff123 that the more you know a codec, the more likely you are to detect its particular artifacts.

QUOTE
I didn't say the opposite of the test, I said the opposite of the anti-WMA sentiment. There are tests that are slightly different, eg extremetech, download.com, the 192 kpbs being discussed atm... that suggest WMA is quite good.


The download.com test was useless in the aspect that the reference wasn't hidden and they did no statystical analysis whatsoever. Considering the results (WMA128 performing better than OGG192 and even WMA192) seems to indicate the participants were just attributing random scores to each played sample, and didn't try to confront the samples themselves.

Couldn't find this extremetech test either. Wonder if they got it right.

QUOTE
The bottom line is, if N= the sample size it is too low to even come to any conclusions.


You know statistics?

QUOTE
What are the maximum bit-rates of the sample files anyway? Why isn't that detail included in the test results page?


To the best of my knowledge, only for MP3 there is a tool that displays the biggest frame size (Encspot). You are welcome to find similar tools for Vorbis, AAC, WMA, etc.

QUOTE
Given all of this, the test is neither scientifically valid, nor can it be relied on to form any conclusions at all. Despite this it is used constantly to give evidence towards MP3 being better than WMA.


Why don't you try and conduce your own listening tests then, oh wise dude?
halb27
Can't we close this down?

astroboy's intention was to have confirmed that WMA standard is quality-wise on par with Lame at 128 kbps under the conditions of his mentioned test.

To me the two test results are so close that I don't have any problem to call them 'on par' (though I personally wouldn't prefer WMA when I can use mp3 with the same output quality. mp3Gain and mp3directCut are two utilities I have often used with ready-made [by me] mp3 files and I don't have these things when using WMA. Not to talk about mp3's universal usability).

Is somebody seeing in this test a significant advantage of mp3 over wma quality-wise? (trying to forget that there has been further development with mp3 but not with wma std).

astroboy's other remark that there's a certain anti-WMA attitude in this forum can also be considered true I think. Most peoples' sympathy goes with a high-quality development of engaged people who do it in their spare time, and not with the development of a huge company like Microsoft. Not to say that mp3 is universally usable and wma is a producer-specific codec.

But quality-wise: is there anybody who would give an advice to astroboy of the kind: No, don't use WMA std. @ 128 kbps - Lame is significantly better?
Rio
i had the same sort of dilemma between WMA and MP3 at ~128, but i settled for LAME 3.97b -V5 at that time

owing to the fact as stated earlier that WMA VBR is actually ABR would make me move away from WMA

i was just wondering what could have been the results if LAME was at --preset 128 vs WMA VBR (or ABR for that matter), maybe this is what astroboy is trying to find out

at any rate, i would still go for MP3. as a matter of fact, i settled back to old school FhG 128 CBR just to simplify things.

cheers!
halb27
QUOTE(Rio @ Apr 18 2007, 15:43) *

i had the same sort of dilemma between WMA and MP3 at ~128, but i settled for LAME 3.97b -V5 at that time

owing to the fact as stated earlier that WMA VBR is actually ABR would make me move away from WMA

i was just wondering what could have been the results if LAME was at --preset 128 vs WMA VBR (or ABR for that matter), maybe this is what astroboy is trying to find out

at any rate, i would still go for MP3. as a matter of fact, i settled back to old school FhG 128 CBR just to simplify things.

cheers!

Now you use CBR, but WMA ABR made you move away from WMA?

a) ABR is a variable bitrate mode like Lame's VBR. ABR is oriented on a target bitrate around which the bitrate is chosen from frame to frame based on certain quality considerations. Lame's VBR in contrary does a psy model oriented estimate at the quality and controls bitrate accordingly with a user given quality measure as the reference.

b) WMA just like Lame can make use of VBR and ABR. In the test mentioned here ABR was used. But you can use VBR if you like to. WMA's VBR quality parameters just don't deliver popular average bitrates which makes it hard to use within listening tests.

c) I once did a small WMA std. listening test at 96 kbps. WMA's ABR mode came out better than the VBR mode with next-higher average bitrate. This doesn't say a lot however cause I don't remember the details of the test. The question which mode to prefer is in general not easy to answer and depends heavenly upon implementation of the encoder under consideration. For Lame for instance used at low bitrate around 100 kbps there's a widespread opinion that ABR is to be preferred over VBR. I don't know whether that's correct, but my own (unfortunately pretty isolated) experience with ABR vs. VBR at very high bitrate (>190 kbps) says that ABR is to be preferred over VBR. guruboolez once gave a 160 kbps test which - looking at the results - points into the same direction. With an encoder like Nero AAC on the other hand I don't see any reason why not to use VBR. Lame 3.98 alpha has improved on VBR too. If an encoder has a well-behaving VBR mode, it is to be preferred.

d) In contradiction to the name even CBR makes use of variable audio data bitrate in a meaningful way. Audio data bitrate variation is in principle done like with ABR, but there are some restrictions. The restrictions are the less relevant the higher the chosen bitrate is. That's why I don't care using CBR @ 192 kbps (though I would prefer using an ABR mode if available). I'm talking about mp3 here but I guess it's similar with many codes.
PoisonDan
QUOTE(halb27 @ Apr 18 2007, 14:47) *

To me the two test results are so close that I don't have any problem to call them 'on par'
The confidence bars do not overlap. The ANOVA analysis confirmed that LAME is clearly rated better than WMA standard (with a significant difference). ff123 himself confirmed Roberto's conclusion (please reread his posts in this thread). And you disagree with all that?

QUOTE
astroboy's other remark that there's a certain anti-WMA attitude in this forum can also be considered true I think.
I disagree. Sure, there may be some anti-Microsoft zealots on this board, but I think the majority of users who choose not to use WMA have objective, sensible reasons for that (I can also think of a few), not just blind hate towards the company.


QUOTE(rjamorim @ Apr 18 2007, 14:02) *

Couldn't find this extremetech test either. Wonder if they got it right.

I think this is the test he is talking about, it certainly wasn't flawless:
http://www.hydrogenaudio.org/forums/index....showtopic=20443
Lyx
QUOTE(PoisonDan @ Apr 18 2007, 16:57) *

QUOTE(halb27 @ Apr 18 2007, 14:47) *

To me the two test results are so close that I don't have any problem to call them 'on par'
The confidence bars do not overlap. The ANOVA analysis confirmed that LAME is clearly rated better than WMA standard (with a significant difference). ff123 himself confirmed Roberto's conclusion (please reread his posts in this thread). And you disagree with all that?

It may very well be the case, that halb and you are talking about different things. You are (correctly) pointing out, that there is a significant STATISTICAL difference. Halb quite possibly however may be talking about the practical consequences and the "big picture". The big picture - which becomes apparent in the non-zoomed diagrams - is that all modern codecs are rated very close too each other. The differences are statistically significant, yet still quite small. Small enough to - for practical purposes - come to the conclusion that from a quality-point of view, it doesnt matter much which codec you choose, so other aspects (i.e. compatibility) become the deciding factor.

- Lyx
halb27
QUOTE(PoisonDan @ Apr 18 2007, 17:02) *

The confidence bars do not overlap. The ANOVA analysis confirmed that LAME is clearly rated better than WMA standard (with a significant difference). ...

I thought we were through with this: Analysis is on judgment reliability of the listeners.
It's quite interesting to often see dedicated correct statements (in this case on the reliabilty of listeners' judgements covered by the analysis) simplified and generalized to emotionally loaded nonsense-statements (in this case kind of beleive in a general overall reliability of the test outcome).

QUOTE(Lyx @ Apr 18 2007, 17:06) *

It may very well be the case, that halb and you are talking about different things. You are (correctly) pointing out, that there is a significant STATISTICAL difference. Halb quite possibly however may be talking about the practical consequences and the "big picture". The big picture - which becomes apparent in the non-zoomed diagrams - is that all modern codecs are rated very close too each other. The differences are statistically significant, yet still quite small. Small enough to - for practical purposes - come to the conclusion that from a quality-point of view, it doesnt matter much which codec you choose, so other aspects (i.e. compatibility) become the deciding factor.

- Lyx

That's exactly the point.
Rio
QUOTE(halb27 @ Apr 18 2007, 22:33) *

Now you use CBR, but WMA ABR made you move away from WMA?


i have got you confused sir

my mp3-wma dilemma was a long time ago (when i still had my pioneer car stereo with mp3/wma function, i just had to sell it)... i just opted to encode new rips via audiograbber using fhg 128 cbr recently (just like the good ol' days...)

cheers! cool.gif
ff123
QUOTE(halb27 @ Apr 18 2007, 08:16) *

QUOTE(PoisonDan @ Apr 18 2007, 17:02) *

The confidence bars do not overlap. The ANOVA analysis confirmed that LAME is clearly rated better than WMA standard (with a significant difference). ...

I thought we were through with this: Analysis is on judgment reliability of the listeners.
It's quite interesting to often see dedicated correct statements (in this case on the reliabilty of listeners' judgements covered by the analysis) simplified and generalized to emotionally loaded nonsense-statements (in this case kind of beleive in a general overall reliability of the test outcome).


I think you mean to say "overall validity of the test outcome." Because it's likely that a similar test would produce the same outcome, in terms of codec rankings (test reliability). However, I don't disagree with the statement that although the differences are statistically significant, they are small (when compared with the entire ratings scale). It's up to the individual person to decide whether or not those differences are small enough to ignore.

Thus the following statements can be simultaneously true:

1. Lame mp3 is better than WMA-std.
2. It doesn't matter.

ff123
JeanLuc
QUOTE(astroboy @ Apr 17 2007, 06:11) *

I do not agree with you at all, otherwise there would be no settings for target bit-rate when doing VBR encoding.


VBR encoding without a target bitrate is called "lossless encoding" ... that's why there are no target-bitrate settings for lossy VBR encodings.

ABR encoding has a target bitrate ... but ABR is nothing more than VBR with more 'rigid' bitrate limitations.
halb27
QUOTE(ff123 @ Apr 18 2007, 17:32) *

I think you mean to say "overall validity of the test outcome." Because it's likely that a similar test would produce the same outcome, in terms of codec rankings (test reliability). However, I don't disagree with the statement that although the differences are statistically significant, they are small (when compared with the entire ratings scale). It's up to the individual person to decide whether or not those differences are small enough to ignore.

Thus the following statements can be simultaneously true:

1. Lame mp3 is better than WMA-std.
2. It doesn't matter.

ff123

You're right: validity is the more adequate word. I'm glad you understand what I say.
I only see it slightly different: From a well organized test (which IMO is the test under investigation) I do expect that when done within other circumstances (other test samples/other encoder settings in case that's relevant) to get similar results, not same results. If the test outcome in test A is 3.9 on encoder A and 4.1 on encoder B I can imagine the test results are reversed in test B. Nobody has ever done such a test AFAIK, so I don't really know whether this is correct, but I totally agree with your conclusion: It's up to the individual person to decide whether or not small differences are small enough to ignore. And with small differences within a specific test to me practical considerations are more of concern.
halb27
QUOTE(JeanLuc @ Apr 18 2007, 18:23) *

VBR encoding without a target bitrate is called "lossless encoding" ... that's why there are no target-bitrate settings for lossy VBR encodings.

ABR encoding has a target bitrate ... but ABR is nothing more than VBR with more 'rigid' bitrate limitations.

???
Lossy VBR the way Lame does it really has no target bitrate. It just comes out that a certain quality setting usually results in a certain average bitrate range on tracks and an average bitrate on a hopefully representive collection of tracks.
ABR usually turns out to have a more restricted bitrate variation than VBR, but I think this is not necessarily the case.
But the main difference is: VBR estimates quality with the psy model as the basis. This has the advantage that bitrate is always chosen adequately when the psy model works fine (which usually is true), and the disadvantage that eventual flaws in the psy model (and its implementation) are amplified.
So there's a significant difference between ABR and VBR.
ABR works much more like CBR which varies audio data bit rate as well. ABR is CBR without the restrictions of CBR regarding audio data bitrate variation.
I'm talking about the audio data stream here ignoring transporting frame representation which has no audio impact.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.