Multiformat Listening Test @ 128 kbps - FINISHED |
![]() ![]() |
Multiformat Listening Test @ 128 kbps - FINISHED |
Jan 17 2006, 20:55
Post
#151
|
|
![]() Group: Members Posts: 1593 Joined: 24-March 02 From: Revere, MA Member No.: 1607 |
QUOTE Perhaps this question has been asked many times, but I'd like to know what it is what makes this aoTuVb4.51 version so extremely good compared to the libvorbis 1.12 version? I even hear "the resurrection of Vorbis" is due to aoTuV! If the aoTuVb4.51 has a score of 4,79 in this test, what average score would the libvorbis 1.12 have compared to the aoTuV and all the others!?? Aoyumi does a terrific job tweaking the Noise Normalization code and bitrate allocation scheme. This is not just for the community, but it's also a Xiph bounty don't forget. In terms of streaming a lot of people perceptual prefer the Noise Normalization which more natural to SBR or PNS. I think both AAC and Vorbis Psychoacoustics models are unique and different from a technical perspective, but what can be seen is they are more perceptually advanced today than we would have seen 5 years ago. These listenings tests are a clear example of this. You are never going to achieve 100% transparency in any case, but a lot samples give you a great indication of the performance of the encoder. This post has been edited by HotshotGG: Jan 17 2006, 20:58 -------------------- College student/IT Assistant
|
|
|
|
Jan 17 2006, 21:11
Post
#152
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
QUOTE (kwanbis @ Jan 17 2006, 07:24 PM) I wonder how you always manage to screw the quotes... Anyways, the RAR and the XLS were updated and have the same names, so just redownload the files. -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Jan 17 2006, 21:14
Post
#153
|
|
![]() Group: Members Posts: 3353 Joined: 6-July 03 From: Sachsen (DE) Member No.: 7609 |
QUOTE I even hear "the resurrection of Vorbis" is due to aoTuV! Besides of promoting hardware-support, xiph did almost nothing since v1.00. Neither bugfixes nor improvements. The reason for vorbis' "resurrection" imho is because of two 3rd party devs: primarily "Aoyumi", and secondarily QuantumKnot. - Lyx This post has been edited by Lyx: Jan 17 2006, 21:14 -------------------- I am arrogant and I can afford it because I deliver.
|
|
|
|
Jan 17 2006, 21:14
Post
#154
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
QUOTE (sehested @ Jan 17 2006, 07:05 PM) QUOTE (Garf @ Jan 15 2006, 06:29 AM) How many of the grades given were 5.0? And how much if you re-add the ranked references as meaning that codec got a 5.0 for that sample? So, there's 403 valid test results times 5 codecs (Shine doesn't count), or about 2015 grades. How many of those are 5.0, i.e. perfectly transparent? @Sebastian: Thanks for the test results. I managed to proces the results and produce this table: CODE Ranked refs 24 14 18 6 19 25 5.0's 304 260 299 36 313 302 5.0's % 75% 65% 74% 9% 78% 75% 4.0 and above 361 334 358 60 375 355 4.0 and above % 90% 83% 89% 15% 93% 88% Since one of the results was invalid (for sample 8 as you can read in my previous posts and on the updated results page), the new number of valid results is 402 - your XLS (which you sent to me via e-mail) shows the old number 403. This is because Shade[ST] sent two different results for the same sample. Both results were valid (didn't contain ranked references IIRC), but only one of them was used. -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Jan 17 2006, 21:37
Post
#155
|
|
|
Group: Developer (Donating) Posts: 2332 Joined: 28-June 02 From: Argentina Member No.: 2425 |
QUOTE (Sebastian Mares @ Jan 17 2006, 08:11 PM) I wonder how you always manage to screw the quotes... i wonder that too -------------------- MAREO: http://www.webearce.com.ar
|
|
|
|
Jan 17 2006, 21:42
Post
#156
|
|
![]() Group: Members Posts: 1593 Joined: 24-March 02 From: Revere, MA Member No.: 1607 |
QUOTE Besides of promoting hardware-support, xiph did almost nothing since v1.00. Neither bugfixes nor improvements. The reason for vorbis' "resurrection" imho is because of two 3rd party devs: primarily "Aoyumi", and secondarily QuantumKnot. Yeah, because they were assigned projects from now to 2015 (that's an exegeration). Besides it's a community project anyway. In that time I think I have seen all of, but two of the bounty's completed. 5.1 and bitrate peeling. The reason being is that has more to do with the scope of the low level libraries and the encoder then other things. Our good friend John33 did a good job fixing the channel mapping though so it's halfway there. Anyway ignore my rantings and continue on with the listening test discussion. This post has been edited by HotshotGG: Jan 17 2006, 21:44 -------------------- College student/IT Assistant
|
|
|
|
Jan 17 2006, 23:22
Post
#157
|
|
![]() Group: Members Posts: 913 Joined: 15-December 01 From: Germany Member No.: 662 |
I'm having some trouble with the comments I see on the results page.
On top it says: "One codec can be said to be better than another with 95% confidence if the bottom of its segment is at or above the top of the competing codec's line segment." In other words, they don't overlap. With which I agree, but right on the first plot it says: "iTunes is not as good as AoTuV", but the confidence intervals do overlap. What is correct? |
|
|
|
Jan 18 2006, 00:47
Post
#158
|
|
![]() Group: Developer Posts: 1245 Joined: 16-December 02 From: Australia Member No.: 4097 |
QUOTE (Lyx @ Jan 18 2006, 06:14 AM) QUOTE I even hear "the resurrection of Vorbis" is due to aoTuV! Besides of promoting hardware-support, xiph did almost nothing since v1.00. Neither bugfixes nor improvements. The reason for vorbis' "resurrection" imho is because of two 3rd party devs: primarily "Aoyumi", and secondarily QuantumKnot. - Lyx We should mention Nyaochi too and his "modest tuning". |
|
|
|
Jan 18 2006, 00:50
Post
#159
|
|
![]() Group: Developer Posts: 1245 Joined: 16-December 02 From: Australia Member No.: 4097 |
QUOTE (HotshotGG @ Jan 18 2006, 05:55 AM) QUOTE Perhaps this question has been asked many times, but I'd like to know what it is what makes this aoTuVb4.51 version so extremely good compared to the libvorbis 1.12 version? I even hear "the resurrection of Vorbis" is due to aoTuV! If the aoTuVb4.51 has a score of 4,79 in this test, what average score would the libvorbis 1.12 have compared to the aoTuV and all the others!?? Aoyumi does a terrific job tweaking the Noise Normalization code and bitrate allocation scheme. This is not just for the community, but it's also a Xiph bounty don't forget. In terms of streaming a lot of people perceptual prefer the Noise Normalization which more natural to SBR or PNS. I think both AAC and Vorbis Psychoacoustics models are unique and different from a technical perspective, but what can be seen is they are more perceptually advanced today than we would have seen 5 years ago. These listenings tests are a clear example of this. You are never going to achieve 100% transparency in any case, but a lot samples give you a great indication of the performance of the encoder. I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry. |
|
|
|
Jan 18 2006, 01:01
Post
#160
|
|
![]() Group: Members Posts: 1593 Joined: 24-March 02 From: Revere, MA Member No.: 1607 |
QUOTE I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry. He mentioned how he was considering adjusting the masking threshold for long blocks once in past, but he couldn't touch it for "other" reasons. Most of his tunings now go into Noise Normalization and bitrate allocation. He made a few significant changes in the past as you know to psychoacoustics model to deal with the "HF boost" issue. He made some simple additions into code that somehow adjust the MDCT in conjunction with the psymodel? that part I really don't understand to much, seeing that trying to figure out how transform interacts with everything else is confusing. It is a bit of a learning experience reading through it though. The AoTuV Beta 3 tunings were also merged into latest Libvorbis I believe This post has been edited by HotshotGG: Jan 18 2006, 01:08 -------------------- College student/IT Assistant
|
|
|
|
Jan 18 2006, 01:10
Post
#161
|
|
![]() Group: Developer Posts: 1245 Joined: 16-December 02 From: Australia Member No.: 4097 |
QUOTE (HotshotGG @ Jan 18 2006, 10:01 AM) QUOTE I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry. He mentioned how he was considering adjusting the masking threshold for long blocks once in past, but he couldn't touch it for "other" reasons. Most of his tunings now go into Noise Normalization and bitrate allocation. He made a few significant changes in the past as you know to psychoacoustics model to deal with the "HF boost" issue. He made some simple additions into code that somehow adjust the MDCT in conjunction with the psymodel? that part I really don't understand to much, seeing that trying to figure out how transform interacts with everything else is confusing. It is a bit of a learning experience reading through it though Cool, thanks. Lately I've thought of an idea that may make block switching more accurate (hopefully it works better than my last attempt at fixing it, which only solved two problem samples |
|
|
|
Jan 18 2006, 01:19
Post
#162
|
|
![]() Group: Members Posts: 1593 Joined: 24-March 02 From: Revere, MA Member No.: 1607 |
QUOTE Cool, thanks. Lately I've thought of an idea that may make block switching more accurate (hopefully it works better than my last attempt at fixing it, which only solved two problem samples lalala.gif ) and hope it doesn't infringe on any patents, as Monty warned in the source. He was only using threshold-by-band masking for some reason (why not experiment though?). Have you experimented with a Wavelet filterbanks or anything of that nature? I know you mentioned 9/7 biorthorthogonal experiments, etc. If had more coding experience I would try to help out, but I have only written simple data algorithms and a lot of the structures in here are immense This post has been edited by HotshotGG: Jan 18 2006, 01:33 -------------------- College student/IT Assistant
|
|
|
|
Jan 18 2006, 01:33
Post
#163
|
|
![]() Group: Developer Posts: 1245 Joined: 16-December 02 From: Australia Member No.: 4097 |
QUOTE (HotshotGG @ Jan 18 2006, 10:19 AM) QUOTE Cool, thanks. Lately I've thought of an idea that may make block switching more accurate (hopefully it works better than my last attempt at fixing it, which only solved two problem samples lalala.gif ) and hope it doesn't infringe on any patents, as Monty warned in the source. He was only using threshold-by-band masking for some reason (why not experiment though?). Have you experimented with a Wavelet filterbanks or anything of that nature? I know you mentioned 9/7 biorthorthogonal experiments, etc. Initially I had a play with wavelets, but I then moved onto other techniques. The 9/7 biorthogonal wavelets seemed useless for audio since they were more suited to image coding, where the signal is predominantly smooth. Oops, we're going OT here. Sorry. This post has been edited by QuantumKnot: Jan 18 2006, 01:34 |
|
|
|
Jan 18 2006, 01:40
Post
#164
|
|
![]() Group: Members Posts: 1593 Joined: 24-March 02 From: Revere, MA Member No.: 1607 |
QUOTE Initially I had a play with wavelets, but I then moved onto other techniques. The 9/7 biorthogonal wavelets seemed useless for audio since they were more suited to image coding, where the signal is predominantly smooth. Yeah you don't find to many publications or Research papers with Wavelets in reguard to audio processing. There basis functions are more suited image and video coding. They also look 10x smoother than DCT based implimentations I was impressed. I am sure there is some filterbanks that would work though. Okedoke back to testing This post has been edited by HotshotGG: Jan 18 2006, 01:41 -------------------- College student/IT Assistant
|
|
|
|
Jan 18 2006, 01:52
Post
#165
|
|
|
Group: Members Posts: 1315 Joined: 3-January 05 From: Argentina, Bs As Member No.: 18803 |
QUOTE (QuantumKnot @ Jan 17 2006, 04:33 PM) Initially I had a play with wavelets, but I then moved onto other techniques. The 9/7 biorthogonal wavelets seemed useless for audio since they were more suited to image coding, where the signal is predominantly smooth. Sorry for offtopic Even for image coding there is still without result. For example Snow wavelet-based videocodec. It was promising new tec codec. However the development of this codec was and is too slow. It is hard to say reason. Maybe devs do not hurry with it or wavelets are not enough powerfull. However x264 (with its great RDO) was and is too fast developing. This post has been edited by IgorC: Jan 18 2006, 01:53 |
|
|
|
Jan 18 2006, 02:47
Post
#166
|
|
![]() Group: Developer Posts: 1245 Joined: 16-December 02 From: Australia Member No.: 4097 |
QUOTE (IgorC @ Jan 18 2006, 10:52 AM) QUOTE (QuantumKnot @ Jan 17 2006, 04:33 PM) Initially I had a play with wavelets, but I then moved onto other techniques. The 9/7 biorthogonal wavelets seemed useless for audio since they were more suited to image coding, where the signal is predominantly smooth. Sorry for offtopic Even for image coding there is still without result. For example Snow wavelet-based videocodec. It was promising new tec codec. However the development of this codec was and is too slow. It is hard to say reason. Maybe devs do not hurry with it or wavelets are not enough powerfull. However x264 (with its great RDO) was and is too fast developing. Well, there is that extra dimension in video. But for image coding, wavelets are definitely the best. They outperform the best block transform coders by a mile, plus it's been shown mathematically why they do better too (for images at least). |
|
|
|
Jan 18 2006, 04:00
Post
#167
|
|
|
Group: Members Posts: 169 Joined: 30-September 01 From: Tokyo, Japan Member No.: 99 |
First of all, thank Sebastian and the participants for conducting/contributing this listening test. I couldn't contribute to this test, but it was interesting to see many encoders are reaching near-transparent quality at 128kbps.
QUOTE (QuantumKnot @ Jan 18 2006, 08:47 AM) QuantumKnot (and guruboolez too), thanks for mentioning it although it's not so much alive in aoTuV's code. QUOTE (QuantumKnot @ Jan 18 2006, 09:10 AM) Cool, thanks. Lately I've thought of an idea that may make block switching more accurate (hopefully it works better than my last attempt at fixing it, which only solved two problem samples I'd love to listen it. |
|
|
|
Jan 18 2006, 04:34
Post
#168
|
|
|
Group: Members Posts: 169 Joined: 30-September 01 From: Tokyo, Japan Member No.: 99 |
As Sehested generated a table in post #149, 60%-80% evaluation trials in the test could not distinguish the compressed samples from the original. However, I found a Japanese blog extracting guruboolez's listening result from the whole result and analyzing it:
http://anonymousriver.hp.infoseek.co.jp/#2...15-1-guruboolez Although this blog is written in Japanese, you will easily find a score table and the results from various stasistical analyses including Friedman. Parametric Turkey's HSD: http://anonymousriver.hp.infoseek.co.jp/12...ruboolez_PT.txt Blocked ANOVA / Fisher's LSD: http://anonymousriver.hp.infoseek.co.jp/12...ruboolez_BA.txt Non-parametric Turkey's HSD: http://anonymousriver.hp.infoseek.co.jp/12...ruboolez_NT.txt Friedman / Nonparametric Fisher's LSD: http://anonymousriver.hp.infoseek.co.jp/12...uruboolez_F.txt guruboolez rarely rated 5.0 for the compressed samples. His scoring is also different especially for LAME, Nero, and WMA Pro. Just not to give a false impression, I don't mean to disrespect the overall result of this test. But it was interesting to see how brilliantly he listens to samples. |
|
|
|
Jan 18 2006, 04:46
Post
#169
|
|
![]() Group: Members Posts: 1189 Joined: 19-May 05 From: Montreal, Canada Member No.: 22144 |
QUOTE (nyaochi @ Jan 17 2006, 09:34 PM) As Sehested generated a table in post #149, 60%-80% evaluation trials in the test could not distinguish the compressed samples from the original. However, I found a Japanese blog extracting guruboolez's listening result from the whole result and analyzing it [...] guruboolez rarely rated 5.0 for the compressed samples. His scoring is also different especially for LAME, Nero, and WMA Pro. Just not to give a false impression, I don't mean to disrespect the overall result of this test. But it was interesting to see how brilliantly he listens to samples. I always knew Francis was gifted ;-) Continue comme ça! |
|
|
|
Jan 18 2006, 05:57
Post
#170
|
|
![]() Group: Developer Posts: 1245 Joined: 16-December 02 From: Australia Member No.: 4097 |
QUOTE (nyaochi @ Jan 18 2006, 01:00 PM) That's assuming I will find time to experiment with it. Oh I better stay on topic....hmm....looks like someone beat me to compiling guru's results. I was also interested in his results. |
|
|
|
Jan 18 2006, 07:15
Post
#171
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
QUOTE (Gecko @ Jan 17 2006, 11:22 PM) I'm having some trouble with the comments I see on the results page. On top it says: "One codec can be said to be better than another with 95% confidence if the bottom of its segment is at or above the top of the competing codec's line segment." In other words, they don't overlap. With which I agree, but right on the first plot it says: "iTunes is not as good as AoTuV", but the confidence intervals do overlap. What is correct? Well, what I wanted to say is that they are tied, but the difference between iTunes and AoTuV is bigger than the difference between AoTuV and WMA Professional for example. -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Jan 18 2006, 09:40
Post
#172
|
|
![]() Group: Members Posts: 218 Joined: 12-October 01 Member No.: 278 |
[As a help in following discussions about Ogg Vorbis throughout HydrogenAudio, I'd like to know which of the posters reads Japanese and communicates fairly reguarly with aoyumi . I'm guessing that QuantumKnot and nyaochi do ... ]
I think this a great test (special kudos to Sebastian!), but the moral seems to be that quality is high and differences between the best codecs are now quite small at these rates. At least, that's so for short music samples ... What I still wonder about is whether there is an additional quality standard of exhaustion in prolonged listening, and whether this type of test necessarily relates to that. We know that extended listening to music with distortion, especially intermodulation, will somehow just wear down and exhaust the listener. The way the listener copes is to stop listening, not necessarily even aware that the distortion frayed his nerves. (Some will recall the puzzlement about early CDs which were claimed to have imperceptible distortion, but nonetheless sounded appalling!) The endurance test I'm thinking of might require subjects to wear the best, most comfortable headphones listening to music encoded with these same codecs until the last subject yanks off the headphones screaming for mercy. |
|
|
|
Jan 18 2006, 13:39
Post
#173
|
|
|
Group: Members Posts: 236 Joined: 14-January 04 From: Kanto, Japan Member No.: 11215 |
Thank you for Sebastian, and the persons concerned and the participant of a test.
I am especially interested in each result. QUOTE I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry. I have not changed block switching algorithm. I am looking forward to your research. QUOTE He made a few significant changes in the past as you know to psychoacoustics model to deal with the "HF boost" issue. This portion will change in the following version again. QUOTE The AoTuV Beta 3 tunings were also merged into latest Libvorbis I believe The aoTuV beta3 is not merged into libvorbis. Please check the source code. Formal libvorbis does not include change which has influence on encode quality after 1.1. EDIT: TYPO&Addition This post has been edited by Aoyumi: Jan 19 2006, 13:45 |
|
|
|
Jan 18 2006, 16:48
Post
#174
|
|
![]() Group: Members (Donating) Posts: 325 Joined: 5-April 04 From: Copenhagen, Denmark Member No.: 13246 |
What if...
Most samples where rated 5.0 and some testers even ranked the references. A few of the more experienced testers where able to avoid using 5.0 at all. The criteria for valid results is currently no ranked references. However this causes some tester to be very conservative and rating 5.0 when in doubt. What if the ranked references where used? What if only results that had no 5.0 rankings where used? How would the overall results then look? In the second line in the table below I have converted references ranked 4.0 and above to 5.0 ratings and included these "invalid" results. In the third line I have reduced the number of test results to only include results without 5.0 rankings. CODE iTunes LAME Nero Shine AuTuV WMA pro Official result 4.74 4.60 4.68 2.35 4.79 4.70 (402) Ranked references 4.74 4.60 4.70 2.38 4.78 4.72 (464) No 5.0 ratings 3.90 3.74 3.57 1.51 3.91 3.69 (54) ANOVA analysis for "No 5.0 ratings": CODE FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/ Blocked ANOVA analysis Number of listeners: 54 Critical significance: 0.05 Significance of data: 0.00E+00 (highly significant) --------------------------------------------------------------- ANOVA Table for Randomized Block Designs Using Ratings Source of Degrees Sum of Mean variation of Freedom squares Square F p Total 323 368.45 Testers (blocks) 53 57.63 Codecs eval'd 5 233.42 46.68 159.82 0.00E+00 Error 265 77.40 0.29 --------------------------------------------------------------- Fisher's protected LSD for ANOVA: 0.205 Means: AuTuV iTunes LAME WMA-pro Nero Shine 3.91 3.90 3.74 3.69 3.57 1.51 ---------------------------- p-value Matrix --------------------------- iTunes LAME WMA-pro Nero Shine AuTuV 0.943 0.099 0.042* 0.001* 0.000* iTunes 0.114 0.049* 0.002* 0.000* LAME 0.696 0.110 0.000* WMA-pro 0.227 0.000* Nero 0.000* ----------------------------------------------------------------------- AuTuV is better than WMA-pro, Nero, Shine iTunes is better than WMA-pro, Nero, Shine LAME is better than Shine WMA-pro is better than Shine Nero is better than Shine Edit: Added ANOVA analysis This post has been edited by sehested: Jan 18 2006, 17:05 |
|
|
|
Jan 19 2006, 17:42
Post
#175
|
|
![]() Group: Members Posts: 1455 Joined: 22-November 05 From: Jakarta Member No.: 25929 |
Okay links to this thread and the complete results have been placed in the wiki page, which you can see here.
Fill the page, guys! (and gals!) -------------------- Nobody is Perfect.
I am Nobody. http://pandu.poluan.info |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 24th May 2013 - 08:27 |