Multiformat Listening Test @ 128 kbps - FINISHED

Topic: Multiformat Listening Test @ 128 kbps - FINISHED (Read 171126 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #150 – 2006-01-17 19:55:33

Quote

Perhaps this question has been asked many times, but I'd like to know what it is what makes this aoTuVb4.51 version so extremely good compared to the libvorbis 1.12 version? I even hear "the resurrection of Vorbis" is due to aoTuV!
If the aoTuVb4.51 has a score of 4,79 in this test, what average score would the libvorbis 1.12 have compared to the aoTuV and all the others!??

Aoyumi does a terrific job tweaking the Noise Normalization code and bitrate allocation scheme. This is not just for the community, but it's also a Xiph bounty don't forget. In terms of streaming a lot of people perceptual prefer the Noise Normalization which more natural to SBR or PNS. I think both AAC and Vorbis Psychoacoustics models are unique and different from a technical perspective, but what can be seen is they are more perceptually advanced today than we would have seen 5 years ago. These listenings tests are a clear example of this. You are never going to achieve 100% transparency in any case, but a lot samples give you a great indication of the performance of the encoder.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #151 – 2006-01-17 20:11:27

Quote

i mean, do you have an updated xls to download?
[a href="index.php?act=findpost&pid=357844"][{POST_SNAPBACK}][/a]

I wonder how you always manage to screw the quotes...

Anyways, the RAR and the XLS were updated and have the same names, so just redownload the files.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #152 – 2006-01-17 20:14:04

Quote

I even hear "the resurrection of Vorbis" is due to aoTuV!

Besides of promoting hardware-support, xiph did almost nothing since v1.00. Neither bugfixes nor improvements. The reason for vorbis' "resurrection" imho is because of two 3rd party devs: primarily "Aoyumi", and secondarily QuantumKnot.

- Lyx

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #153 – 2006-01-17 20:14:40

Quote

Quote
How many of the grades given were 5.0? And how much if you re-add the ranked references as meaning that codec got a 5.0 for that sample?

So, there's 403 valid test results times 5 codecs (Shine doesn't count), or about 2015 grades. How many of those are 5.0, i.e. perfectly transparent?
[a href="index.php?act=findpost&pid=357297"][{POST_SNAPBACK}][/a]

@Sebastian: Thanks for the test results.

I managed to proces the results and produce this table:
Code: [Select]
Ranked refs      24     14     18      6     19     25
5.0's           304    260    299     36    313    302
5.0's %          75%    65%    74%     9%    78%    75%
4.0 and above   361    334    358     60    375    355
4.0 and above %  90%    83%    89%    15%    93%    88%
[a href="index.php?act=findpost&pid=357840"][{POST_SNAPBACK}][/a]

Since one of the results was invalid (for sample 8 as you can read in my previous posts and on the updated results page), the new number of valid results is 402 - your XLS (which you sent to me via e-mail) shows the old number 403. This is because Shade[ST] sent two different results for the same sample. Both results were valid (didn't contain ranked references IIRC), but only one of them was used.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #154 – 2006-01-17 20:37:42

Quote

I wonder how you always manage to screw the quotes...

i wonder that too

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #155 – 2006-01-17 20:42:23

Quote

Besides of promoting hardware-support, xiph did almost nothing since v1.00. Neither bugfixes nor improvements. The reason for vorbis' "resurrection" imho is because of two 3rd party devs: primarily "Aoyumi", and secondarily QuantumKnot.

Yeah, because they were assigned projects from now to 2015 (that's an exegeration). Besides it's a community project anyway. In that time I think I have seen all of, but two of the bounty's completed. 5.1 and bitrate peeling. The reason being is that has more to do with the scope of the low level libraries and the encoder then other things. Our good friend John33 did a good job fixing the channel mapping though so it's halfway there. Anyway ignore my rantings and continue on with the listening test discussion.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #156 – 2006-01-17 22:22:31

I'm having some trouble with the comments I see on the results page.
On top it says: "One codec can be said to be better than another with 95% confidence if the bottom of its segment is at or above the top of the competing codec's line segment." In other words, they don't overlap. With which I agree, but right on the first plot it says: "iTunes is not as good as AoTuV", but the confidence intervals do overlap. What is correct?

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #157 – 2006-01-17 23:47:28

Quote

Quote
I even hear "the resurrection of Vorbis" is due to aoTuV!

Besides of promoting hardware-support, xiph did almost nothing since v1.00. Neither bugfixes nor improvements. The reason for vorbis' "resurrection" imho is because of two 3rd party devs: primarily "Aoyumi", and secondarily QuantumKnot.

- Lyx
[a href="index.php?act=findpost&pid=357863"][{POST_SNAPBACK}][/a]

We should mention Nyaochi too and his "modest tuning".

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #158 – 2006-01-17 23:50:47

Quote

Quote
Perhaps this question has been asked many times, but I'd like to know what it is what makes this aoTuVb4.51 version so extremely good compared to the libvorbis 1.12 version? I even hear "the resurrection of Vorbis" is due to aoTuV!
If the aoTuVb4.51 has a score of 4,79 in this test, what average score would the libvorbis 1.12 have compared to the aoTuV and all the others!??

Aoyumi does a terrific job tweaking the Noise Normalization code and bitrate allocation scheme. This is not just for the community, but it's also a Xiph bounty don't forget. In terms of streaming a lot of people perceptual prefer the Noise Normalization which more natural to SBR or PNS. I think both AAC and Vorbis Psychoacoustics models are unique and different from a technical perspective, but what can be seen is they are more perceptually advanced today than we would have seen 5 years ago. These listenings tests are a clear example of this. You are never going to achieve 100% transparency in any case, but a lot samples give you a great indication of the performance of the encoder.
[a href="index.php?act=findpost&pid=357859"][{POST_SNAPBACK}][/a]

I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #159 – 2006-01-18 00:01:21

Quote

I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry.

He mentioned how he was considering adjusting the masking threshold for long blocks once in past, but he couldn't touch it for "other" reasons. Most of his tunings now go into Noise Normalization and bitrate allocation. He made a few significant changes in the past as you know to psychoacoustics model to deal with the "HF boost" issue. He made some simple additions into code that somehow adjust the MDCT in conjunction with the psymodel? that part I really don't understand to much, seeing that trying to figure out how transform interacts with everything else is confusing. It is a bit of a learning experience reading through it though. The AoTuV Beta 3 tunings were also merged into latest Libvorbis I believe

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #160 – 2006-01-18 00:10:05

Quote

Quote
I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry.

He mentioned how he was considering adjusting the masking threshold for long blocks once in past, but he couldn't touch it for "other" reasons. Most of his tunings now go into Noise Normalization and bitrate allocation. He made a few significant changes in the past as you know to psychoacoustics model to deal with the "HF boost" issue. He made some simple additions into code that somehow adjust the MDCT in conjunction with the psymodel? that part I really don't understand to much, seeing that trying to figure out how transform interacts with everything else is confusing. It is a bit of a learning experience reading through it though
[a href="index.php?act=findpost&pid=357913"][{POST_SNAPBACK}][/a]

Cool, thanks. Lately I've thought of an idea that may make block switching more accurate (hopefully it works better than my last attempt at fixing it, which only solved two problem samples ) and hope it doesn't infringe on any patents, as Monty warned in the source.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #161 – 2006-01-18 00:19:59

Quote

Cool, thanks. Lately I've thought of an idea that may make block switching more accurate (hopefully it works better than my last attempt at fixing it, which only solved two problem samples lalala.gif ) and hope it doesn't infringe on any patents, as Monty warned in the source.

He was only using threshold-by-band masking for some reason (why not experiment though?). Have you experimented with a Wavelet filterbanks or anything of that nature? I know you mentioned 9/7 biorthorthogonal experiments, etc. If had more coding experience I would try to help out, but I have only written simple data algorithms and a lot of the structures in here are immense

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #162 – 2006-01-18 00:33:01

Quote

Quote
Cool, thanks. Lately I've thought of an idea that may make block switching more accurate (hopefully it works better than my last attempt at fixing it, which only solved two problem samples lalala.gif ) and hope it doesn't infringe on any patents, as Monty warned in the source.

He was only using threshold-by-band masking for some reason (why not experiment though?). Have you experimented with a Wavelet filterbanks or anything of that nature? I know you mentioned 9/7 biorthorthogonal experiments, etc.
[a href="index.php?act=findpost&pid=357918"][{POST_SNAPBACK}][/a]

Initially I had a play with wavelets, but I then moved onto other techniques. The 9/7 biorthogonal wavelets seemed useless for audio since they were more suited to image coding, where the signal is predominantly smooth.

Oops, we're going OT here. Sorry.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #163 – 2006-01-18 00:40:16

Quote

Initially I had a play with wavelets, but I then moved onto other techniques. The 9/7 biorthogonal wavelets seemed useless for audio since they were more suited to image coding, where the signal is predominantly smooth.

Yeah you don't find to many publications or Research papers with Wavelets in reguard to audio processing. There basis functions are more suited image and video coding. They also look 10x smoother than DCT based implimentations I was impressed. I am sure there is some filterbanks that would work though. Okedoke back to testing

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #164 – 2006-01-18 00:52:21

Quote

Initially I had a play with wavelets, but I then moved onto other techniques. The 9/7 biorthogonal wavelets seemed useless for audio since they were more suited to image coding, where the signal is predominantly smooth.

Sorry for offtopic
Even for image coding there is still without result. For example Snow wavelet-based videocodec. It was promising new tec codec. However the development of this codec was and is too slow. It is hard to say reason. Maybe devs do not hurry with it or wavelets are not enough powerfull. However x264 (with its great RDO) was and is too fast developing.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #165 – 2006-01-18 01:47:44

Quote

Quote
Initially I had a play with wavelets, but I then moved onto other techniques. The 9/7 biorthogonal wavelets seemed useless for audio since they were more suited to image coding, where the signal is predominantly smooth.

Sorry for offtopic
Even for image coding there is still without result. For example Snow wavelet-based videocodec. It was promising new tec codec. However the development of this codec was and is too slow. It is hard to say reason. Maybe devs do not hurry with it or wavelets are not enough powerfull. However x264 (with its great RDO) was and is too fast developing.
[a href="index.php?act=findpost&pid=357924"][{POST_SNAPBACK}][/a]

Well, there is that extra dimension in video. Plus there is often a gap between research and actual implementation I guess.

But for image coding, wavelets are definitely the best. They outperform the best block transform coders by a mile, plus it's been shown mathematically why they do better too (for images at least).

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #166 – 2006-01-18 03:00:14

First of all, thank Sebastian and the participants for conducting/contributing this listening test. I couldn't contribute to this test, but it was interesting to see many encoders are reaching near-transparent quality at 128kbps.

Quote

We should mention Nyaochi too and his "modest tuning".
[a href="index.php?act=findpost&pid=357907"][{POST_SNAPBACK}][/a]

QuantumKnot (and guruboolez too), thanks for mentioning it although it's not so much alive in aoTuV's code. QKTune's HF noise compensation (or the similar technique) still plays an important role in aoTuV.

Quote

Cool, thanks. Lately I've thought of an idea that may make block switching more accurate (hopefully it works better than my last attempt at fixing it, which only solved two problem samples ) and hope it doesn't infringe on any patents, as Monty warned in the source.
[a href="index.php?act=findpost&pid=357914"][{POST_SNAPBACK}][/a]

I'd love to listen it.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #167 – 2006-01-18 03:34:22

As Sehested generated a table in post #149, 60%-80% evaluation trials in the test could not distinguish the compressed samples from the original. However, I found a Japanese blog extracting guruboolez's listening result from the whole result and analyzing it:
http://anonymousriver.hp.infoseek.co.jp/#2...15-1-guruboolez
Although this blog is written in Japanese, you will easily find a score table and the results from various stasistical analyses including Friedman.
Parametric Turkey's HSD: http://anonymousriver.hp.infoseek.co.jp/12...ruboolez_PT.txt
Blocked ANOVA / Fisher's LSD: http://anonymousriver.hp.infoseek.co.jp/12...ruboolez_BA.txt
Non-parametric Turkey's HSD: http://anonymousriver.hp.infoseek.co.jp/12...ruboolez_NT.txt
Friedman / Nonparametric Fisher's LSD: http://anonymousriver.hp.infoseek.co.jp/12...uruboolez_F.txt

guruboolez rarely rated 5.0 for the compressed samples. His scoring is also different especially for LAME, Nero, and WMA Pro. Just not to give a false impression, I don't mean to disrespect the overall result of this test. But it was interesting to see how brilliantly he listens to samples.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #168 – 2006-01-18 03:46:37

Quote

As Sehested generated a table in post #149, 60%-80% evaluation trials in the test could not distinguish the compressed samples from the original. However, I found a Japanese blog extracting guruboolez's listening result from the whole result and analyzing it [...]
guruboolez rarely rated 5.0 for the compressed samples. His scoring is also different especially for LAME, Nero, and WMA Pro. Just not to give a false impression, I don't mean to disrespect the overall result of this test. But it was interesting to see how brilliantly he listens to samples.

I always knew Francis was gifted ;-)

Continue comme ça!

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #169 – 2006-01-18 04:57:54

Quote

I'd love to listen it.
[a href="index.php?act=findpost&pid=357948"][{POST_SNAPBACK}][/a]

That's assuming I will find time to experiment with it. If you are interested, we could discuss about the idea I had. I'll give you an e-mail when the time comes.

Oh I better stay on topic....hmm....looks like someone beat me to compiling guru's results. I was also interested in his results.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #170 – 2006-01-18 06:15:16

Quote

I'm having some trouble with the comments I see on the results page.
On top it says: "One codec can be said to be better than another with 95% confidence if the bottom of its segment is at or above the top of the competing codec's line segment." In other words, they don't overlap. With which I agree, but right on the first plot it says: "iTunes is not as good as AoTuV", but the confidence intervals do overlap. What is correct?
[a href="index.php?act=findpost&pid=357889"][{POST_SNAPBACK}][/a]

Well, what I wanted to say is that they are tied, but the difference between iTunes and AoTuV is bigger than the difference between AoTuV and WMA Professional for example.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #171 – 2006-01-18 08:40:35

[As a help in following discussions about Ogg Vorbis throughout HydrogenAudio, I'd like to know which of the posters reads Japanese and communicates fairly reguarly with aoyumi . I'm guessing that QuantumKnot and nyaochi do ... ]

I think this a great test (special kudos to Sebastian!), but the moral seems to be that quality is high and differences between the best codecs are now quite small at these rates. At least, that's so for short music samples ...
What I still wonder about is whether there is an additional quality standard of exhaustion in prolonged listening, and whether this type of test necessarily relates to that. We know that extended listening to music with distortion, especially intermodulation, will somehow just wear down and exhaust the listener. The way the listener copes is to stop listening, not necessarily even aware that the distortion frayed his nerves. (Some will recall the puzzlement about early CDs which were claimed to have imperceptible distortion, but nonetheless sounded appalling!)
The endurance test I'm thinking of might require subjects to wear the best, most comfortable headphones listening to music encoded with these same codecs until the last subject yanks off the headphones screaming for mercy.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #172 – 2006-01-18 12:39:06

Thank you for Sebastian, and the persons concerned and the participant of a test.
I am especially interested in each result.

Quote

I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry.

I have not changed block switching algorithm. I am looking forward to your research.

Quote

He made a few significant changes in the past as you know to psychoacoustics model to deal with the "HF boost" issue.

This portion will change in the following version again.

Quote

The AoTuV Beta 3 tunings were also merged into latest Libvorbis I believe

The aoTuV beta3 is not merged into libvorbis. Please check the source code. Formal libvorbis does not include change which has influence on encode quality after 1.1.

EDIT: TYPO&Addition

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #173 – 2006-01-18 15:48:55

What if...

Most samples where rated 5.0 and some testers even ranked the references. A few of the more experienced testers where able to avoid using 5.0 at all.

The criteria for valid results is currently no ranked references. However this causes some tester to be very conservative and rating 5.0 when in doubt.

What if the ranked references where used?
What if only results that had no 5.0 rankings where used?
How would the overall results then look?

In the second line in the table below I have converted references ranked 4.0 and above to 5.0 ratings and included these "invalid" results.

In the third line I have reduced the number of test results to only include results without 5.0 rankings.

Code: [Select]

                   iTunes    LAME     Nero    Shine    AuTuV   WMA pro
Official result    4.74     4.60     4.68     2.35     4.79     4.70     (402)
Ranked references  4.74     4.60     4.70     2.38     4.78     4.72     (464)
No 5.0 ratings     3.90     3.74     3.57     1.51     3.91     3.69      (54)

ANOVA analysis for "No 5.0 ratings":

Code: [Select]

FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 54
Critical significance:  0.05
Significance of data: 0.00E+00 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of         Degrees     Sum of    Mean
variation         of Freedom  squares   Square    F      p

Total              323         368.45
Testers (blocks)    53          57.63
Codecs eval'd        5         233.42   46.68   159.82  0.00E+00
Error              265          77.40    0.29
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:   0.205

Means:

AuTuV    iTunes   LAME     WMA-pro  Nero     Shine    
  3.91     3.90     3.74     3.69     3.57     1.51   

---------------------------- p-value Matrix ---------------------------

         iTunes   LAME     WMA-pro  Nero     Shine    
AuTuV    0.943    0.099    0.042*   0.001*   0.000*   
iTunes            0.114    0.049*   0.002*   0.000*   
LAME                       0.696    0.110    0.000*   
WMA-pro                             0.227    0.000*   
Nero                                         0.000*   
-----------------------------------------------------------------------

AuTuV is better than WMA-pro, Nero, Shine
iTunes is better than WMA-pro, Nero, Shine
LAME is better than Shine
WMA-pro is better than Shine
Nero is better than Shine

Edit: Added ANOVA analysis

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #174 – 2006-01-19 16:42:26

Okay links to this thread and the complete results have been placed in the wiki page, which you can see here.

Fill the page, guys! (and gals!)

Notice