IPB

Welcome Guest ( Log In | Register )

4 Pages V  « < 2 3 4  
Reply to this topicStart new topic
Multiformat Listening Test @ 64 kbps - FINISHED
kdo
post Sep 6 2007, 13:54
Post #76





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



All of a sudden, I have got a small question -- about the error bars on all the plots.

If we compare the plots for two different samples, the error bars are shorter for the sample with more listeners. This makes sense. (More listeners --> more representative statistics --> less error) Ok.

But if we look at just one plot (any one of the plots), it seems the error bars of all 5 contenders have exactly the same size. Are they actually exactly the same? Is it how it's supposed to be due to the design of the test?
Are there any circumstances when error bars could have different size for different contenders?
Go to the top of the page
+Quote Post
Sebastian Mares
post Sep 6 2007, 15:29
Post #77





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



Within a sample plot, all bars should have the same size - always.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
kdo
post Sep 6 2007, 16:06
Post #78





Group: Members (Donating)
Posts: 304
Joined: 18-April 02
From: Russia
Member No.: 1812



QUOTE (Sebastian Mares @ Sep 6 2007, 16:29) *
Within a sample plot, all bars should have the same size - always.

Somehow this feels counter-intuitive.

Imagine an extreme case when one contender is rated 3.0 by ALL listeners (i.e. all of them give exactly the same rating), but other contender gets different ratings between 1.0 and 5.0
Why should the error bars be equal?

(I don't doubt the results, just want to understand a little deeper.)
Go to the top of the page
+Quote Post
Sebastian Mares
post Sep 6 2007, 17:20
Post #79





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



Maybe someone with more knowledge in statistics can answer your question.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
robert
post Sep 6 2007, 18:00
Post #80


LAME developer


Group: Developer
Posts: 783
Joined: 22-September 01
Member No.: 5



Who said all bars should be equal? What do you want the bars to represent?

some boxplot example: http://www.physics.csbsju.edu/stats/box2.html
Go to the top of the page
+Quote Post
Sebastian Mares
post Sep 6 2007, 23:06
Post #81





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



In my results (and Roberto's, Guru's and ff123's), the bars for the various contenders of the same sample will have the same length.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
naylor83
post Sep 6 2007, 23:11
Post #82





Group: Members
Posts: 204
Joined: 19-June 05
From: Uppsala, Sweden
Member No.: 22842



QUOTE (Sebastian Mares @ Sep 7 2007, 00:06) *
In my results (and Roberto's, Guru's and ff123's), the bars for the various contenders of the same sample will have the same length.


If the bars are supposed to indicate the quartiles they should vary a bit. But I haven't checked what those bars are supposed to be...


--------------------
davidnaylor.org

Vorbis Q4, please. AoTuv b5, preferably.
Go to the top of the page
+Quote Post
ff123
post Sep 7 2007, 03:48
Post #83


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



For this type of analysis, the error bars are all the same size. Another way you can do the analysis is to have a different confidence range for every comparison. So for the 5 codecs (including the anchors), you would have 10 different numbers. This can be represented well in matrix table format, but not nicely in a graph format. If you want to get matrix type confidence ranges, download the bootstrap program from my site:

http://ff123.net/bootstrap/

which performs this type of analysis. In practice, the two types of analyses yield very similar results.
Go to the top of the page
+Quote Post
robert
post Sep 7 2007, 10:58
Post #84


LAME developer


Group: Developer
Posts: 783
Joined: 22-September 01
Member No.: 5



So the bars do not represent the distribution of data collected for each codec, as, for example, you could have one codec rated by all people 5.0 and you'll add bars to it. I find this confusing. What is the meaning of the painted bars? How should I read them?
Go to the top of the page
+Quote Post
Moguta
post Sep 8 2007, 02:56
Post #85





Group: Members
Posts: 243
Joined: 26-June 02
Member No.: 2395



I would've loved to see MP3 involved in this test. We know that Vorbis, AAC, and WMA are better, but just as a comparison it's always interesting to see how the newer, improved codecs rate nowadays against our friendly ol' MP3 fomat, to know exactly how much of an improvement there is.
Go to the top of the page
+Quote Post
ff123
post Sep 8 2007, 03:38
Post #86


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE (robert @ Sep 7 2007, 02:58) *
So the bars do not represent the distribution of data collected for each codec, as, for example, you could have one codec rated by all people 5.0 and you'll add bars to it. I find this confusing. What is the meaning of the painted bars? How should I read them?


If the bottom of the bar of one codec does not touch the top of the bar of another codec, you can state with at least 95% confidence that the first codec is better than the second one.

The bars being all the same size means that you might lose a bit of power in making statistical distinctions between codecs. But I think that's more than balanced by having the nice, easy-to-look at pictures instead of tables of numbers.

There are some who assert (and they have a point) that even if there are statistical differences between codecs, it may not make a practical difference if the ratings are relatively close to each other (close being determined by looking at the pictures and making a judgment).
Go to the top of the page
+Quote Post
muaddib
post Sep 26 2007, 13:40
Post #87





Group: Developer
Posts: 398
Joined: 14-October 01
Member No.: 289



QUOTE (muaddib @ Aug 16 2007, 11:07) *
QUOTE (kennedyb4 @ Aug 16 2007, 03:58) *
It seems that Itunes at 96 VBR has outscored Itunes 128 CBR from the previous multi-format test.
That's a substantial improvement unless the difficulty of the samples is not comparable.
Different samples, different participants. Just look at how personal results posted here differ from the average.
Results from different listening tests are just not easily comparable.

Sorry for bringing this up again, but I have one more note about this. iTunes 96kbps VBR was used in this test at 64kbps and in previous at 48kbps. Some samples were used in both tests. But score for those sample is not the same (example: Toms Diner 4.70 vs 4.86) and the decoded sample is the same. Even a participant involved in both tests didn't give the same rating (examples: Alex B 4.0 vs 4.2, haregoo 5.0 vs 4.5).
Unfortunately it is not possible to get consistent results sad.gif
Go to the top of the page
+Quote Post
Sebastian Mares
post Sep 26 2007, 14:28
Post #88





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



Yes, this is normal and depends on the mood, the listening-conditions (maybe different headphones or soundcard, possible noise from the neighbors, etc.) and health (maybe the listener just got better from a cold or still has a cold while testing).


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
benwaggoner
post Apr 10 2008, 23:25
Post #89





Group: Members
Posts: 16
Joined: 19-June 04
From: Portland, OR
Member No.: 14779



Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.


--------------------
[URL="http://on10.net/blogs/benwagg/"]Ben Waggoner[/URL]
Go to the top of the page
+Quote Post
benski
post Apr 10 2008, 23:29
Post #90


Winamp Developer


Group: Developer
Posts: 669
Joined: 17-July 05
From: Ashburn, VA
Member No.: 23375



QUOTE (benwaggoner @ Apr 10 2008, 18:25) *
Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.


I would agree here. Streaming is the main use so far for 64kbps. Low bitrates are interesting for portable devices, but the CPU usage (and hence battery life) of the winners of this test (HE-AAC and WMA Pro) leaves a lot to be desired.

This post has been edited by benski: Apr 10 2008, 23:30
Go to the top of the page
+Quote Post
benwaggoner
post Apr 11 2008, 02:57
Post #91





Group: Members
Posts: 16
Joined: 19-June 04
From: Portland, OR
Member No.: 14779



QUOTE (benski @ Apr 10 2008, 14:29) *
QUOTE (benwaggoner @ Apr 10 2008, 18:25) *

Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.


I would agree here. Streaming is the main use so far for 64kbps. Low bitrates are interesting for portable devices, but the CPU usage (and hence battery life) of the winners of this test (HE-AAC and WMA Pro) leaves a lot to be desired.

How are you measuring CPU use/battery drain of the codecs? We've done a ton of work for the mobile implementations of WMA Pro to get the CPU hit low enough to make it feasible for phone use. I haven't done any formal testing with recent devices though.


--------------------
[URL="http://on10.net/blogs/benwagg/"]Ben Waggoner[/URL]
Go to the top of the page
+Quote Post
Sebastian Mares
post Apr 11 2008, 18:43
Post #92





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



The reason why WMA was tested in CBR mode is that Microsoft seems to recommend CBR over VBR for WMA. Also, IIRC, VBR produced target bitrates that deviated from the average bitrate of the other encoders by more than 10%. 2-pass modes for short samples are also not an option - using 2-pass must be done on complete tracks and then samples have to be extracted out of the encoded full tracks.

A pure CBR test could be interesting for streaming indeed.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
benwaggoner
post Apr 11 2008, 21:20
Post #93





Group: Members
Posts: 16
Joined: 19-June 04
From: Portland, OR
Member No.: 14779



QUOTE (Sebastian Mares @ Apr 11 2008, 09:43) *
The reason why WMA was tested in CBR mode is that Microsoft seems to recommend CBR over VBR for WMA.

Do we? Do you have a link - I'd like to have that corrected. Speaking for Microsoft, I recommend that content that needs CBR be encoded as 2-pass CBR, and otherwise 2-pass VBR be used. We've done a lot of work around 2-pass audio encoding.

QUOTE
Also, IIRC, VBR produced target bitrates that deviated from the average bitrate of the other encoders by more than 10%. 2-pass modes for short samples are also not an option - using 2-pass must be done on complete tracks and then samples have to be extracted out of the encoded full tracks.

Hmmm. How short are the clips you're using? If you can give me a reproducible test for this, I'll pass it on to our engineers. In my experience, VBR audio comes out within 1% of the target, but I'm normally encoding at least 60 second clips.

2-pass VBR peak limited might work better in this case. But if you need to use CBR, at least use 2-pass.

QUOTE
A pure CBR test could be interesting for streaming indeed.

Great, I'd love to see that as well.

For the WMA codecs, the proper mode to use for that (unless it's a test of live encoders) would be 2-pass CBR. We are able to get a meaningful reduction in peak QP with 2-pass CBR.


--------------------
[URL="http://on10.net/blogs/benwagg/"]Ben Waggoner[/URL]
Go to the top of the page
+Quote Post
Sebastian Mares
post Apr 11 2008, 21:45
Post #94





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



The test performed by NSTL featured WMA in CBR mode. Since you explicitly instructed NSTL what settings to use, one would assume you had a reason why you did this: obtain best quality results.

If that is not the case, well, sorry. IIRC, WMA did not offer a quality based VBR mode that produced files with the target bitrate.

Could you explain me what multi-pass CBR is supposed to do? I thought multi-pass encoding was good for ABR only. For CBR you always assign the same number of bits (don't know if WMA has something like a bit reservoir -in case it does, I imagine that could be the only variable thing that could be influenced by multi-pass encoding).
As for bitrate based VBR (which I call ABR) I would prefer to encode full tracks and then extract the sample from that. Otherwise the test has no or less usage.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
benwaggoner
post Apr 11 2008, 23:28
Post #95





Group: Members
Posts: 16
Joined: 19-June 04
From: Portland, OR
Member No.: 14779



QUOTE (Sebastian Mares @ Apr 11 2008, 12:45) *
The test performed by NSTL featured WMA in CBR mode. Since you explicitly instructed NSTL what settings to use, one would assume you had a reason why you did this: obtain best quality results.
That test was done before my time, but my understanding is that we used 1-pass CBR in that case as that was the only rate-controlled mode supported by HE AAC, and the goal was to have an apples-to-apples test. It was never meant to be a demonstration of best practices. 1-pass CBR is certainly the most challenging codec mode, so it's interesting to test, but nothing I use other than for live encoding.

QUOTE
If that is not the case, well, sorry. IIRC, WMA did not offer a quality based VBR mode that produced files with the target bitrate.
Understood. I just want to help make future tests a more scenario-relevant comparison.

QUOTE
Could you explain me what multi-pass CBR is supposed to do? I thought multi-pass encoding was good for ABR only. For CBR you always assign the same number of bits (don't know if WMA has something like a bit reservoir -in case it does, I imagine that could be the only variable thing that could be influenced by multi-pass encoding).
Correct. with 2-pass CBR, you're able to essentially request a bigger bit reservoir in advance of complex audio, to keep worst-case QP lower. With 2-pass VBR, we essentially calculate the QP that will produce closest to the optimum bitrate, and then vary QP's per block a little in order to hit the target. But in essence an unconstrained 2-pass VBR is a lot like a "magic" way to figure out what quality level to use to give a file of the requested size.

QUOTE
As for bitrate based VBR (which I call ABR) I would prefer to encode full tracks and then extract the sample from that. Otherwise the test has no or less usage.
Makes sense to me.

Moderation: Fixed quotes.

This post has been edited by greynol: Apr 11 2008, 23:35


--------------------
[URL="http://on10.net/blogs/benwagg/"]Ben Waggoner[/URL]
Go to the top of the page
+Quote Post
hellokeith
post Apr 12 2008, 08:27
Post #96





Group: Members
Posts: 288
Joined: 14-August 06
Member No.: 34027



QUOTE (benwaggoner @ Apr 11 2008, 15:20) *
Do we? Do you have a link - I'd like to have that corrected. Speaking for Microsoft, I recommend that content that needs CBR be encoded as 2-pass CBR, and otherwise 2-pass VBR be used. We've done a lot of work around 2-pass audio encoding.


Hi Ben,

Nice to see you here at HA. smile.gif I think you'll find this place somewhat subdued compared to AVSF..

Interesting you speak of 2-pass VBR WMA. I have been using -a_codec WMA9STD -a_mode 3 -a_setting 128_44_2 for more than a year with excellent results on my portable. I think perhaps it is underrated/underused in the lossless community, though it wasn't trivial to get the VBS command line options all sorted out. rolleyes.gif The reason I ended up with ~128kb 2-pass VBR WMA was that during my testing, I found it maintained the best stereo imaging during intricate percussion/cymbal passages.
Go to the top of the page
+Quote Post
IgorC
post Apr 12 2008, 12:55
Post #97





Group: Members
Posts: 1506
Joined: 3-January 05
From: Argentina, Bs As
Member No.: 18803



I tried 1 and 2 pass CBR wma10 at 64 kbit/s in past. I didn't share the results here. There were miscellaneous changes but I couldn't abxed the difference.
So maybe 2 pass has a bigger reservoir and other kind of grass called "magic" it makes no sense for audio CBR encoding. If anyone doesn't agree provide samples where 2 pass CBR is better than 1 pass for wma10.

This post has been edited by IgorC: Apr 12 2008, 12:56
Go to the top of the page
+Quote Post
benwaggoner
post Apr 13 2008, 06:19
Post #98





Group: Members
Posts: 16
Joined: 19-June 04
From: Portland, OR
Member No.: 14779



QUOTE (hellokeith @ Apr 11 2008, 23:27) *
Nice to see you here at HA. smile.gif I think you'll find this place somewhat subdued compared to AVSF..

Thank goodness smile.gif!

QUOTE
Interesting you speak of 2-pass VBR WMA. I have been using -a_codec WMA9STD -a_mode 3 -a_setting 128_44_2 for more than a year with excellent results on my portable. I think perhaps it is underrated/underused in the lossless community, though it wasn't trivial to get the VBS command line options all sorted out. rolleyes.gif The reason I ended up with ~128kb 2-pass VBR WMA was that during my testing, I found it maintained the best stereo imaging during intricate percussion/cymbal passages.

Cool, glad it's working out for you.

I'd probably recommend using -a_mode 4 and set a peak bitrate instad of leaving it entirely unconstrained, since devices may have a maximum supported rate. For Zune, it's 320 for audio-only files, and 192 for soundtracks in WMV files, IIRC.

Stuff like stereo seperation is a great thing to use VBR for, since it gets you the bits were you need them. I think people spend so much time sweating the hard clips they can miss that most of most full tracks aren't that hard.


--------------------
[URL="http://on10.net/blogs/benwagg/"]Ben Waggoner[/URL]
Go to the top of the page
+Quote Post
vinnie97
post Apr 14 2008, 08:30
Post #99





Group: Members
Posts: 472
Joined: 6-March 03
Member No.: 5360



I'm still anxiously awaiting the forthcoming ~80kbps multiformat test, especially now that Ayoume has just released beta 5.5 to infuse more life into Vorbis. smile.gif
Go to the top of the page
+Quote Post

4 Pages V  « < 2 3 4
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 18th April 2014 - 04:07