Help - Search - Members - Calendar
Full Version: 64 kbps listening test 2005
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
Pages: 1, 2, 3, 4
sehested
Yet another sample suggestion:

Dire Straits - Iron Hand
S_O
QUOTE
The HE-AAC encoder from Helix Producer hasn't been improve since last 64kbps listening tests. There is only new thing : introduction of new target bitrates (32 & 48 kbps).
Huh? I cannot remember a listening test with Helix HE-AAC being tested. There were a 64kbps test with RealAudio Cook once, but with Helix HE-AAC? I cannot remember.
rjamorim
QUOTE(Gecko @ Mar 22 2005, 08:06 PM)
But if you need to run a separate test anyway then you might as well do it beforehand. smile.gif Maybe there are serious flaws which could ruin Vorbis' reputation!
*



Yes, I believe it would be very bad if Vorbis reputation got ruined thanks to a badly tuned encoder.

Also, this test is only supposed to start after Apple releases iTunes 5 / QuickTime 7. That should give people plenty of time to conduce paralel tests, if there is really any interest...

QUOTE(kurtnoise @ Mar 22 2005, 08:19 PM)
The HE-AAC encoder from Helix Producer hasn't been improve since last 64kbps listening tests.
*



It wasn't even tested back then :B
Acid Orange Juice
IMO this sample could be very useful for this test. In particular he cause a lot of phasing problems with lame --preset cbr 128 and Vorbis low bitrates.

Download HERE
znode
QUOTE(Gecko @ Mar 22 2005, 03:06 PM)
But if you need to run a separate test anyway then you might as well do it beforehand. smile.gif Maybe there are serious flaws which could ruin Vorbis' reputation!
*



Yeah, misinterpretation of results is quite a problem. I've seen countless people claim vorbis to be always better than any other codec, in all cases, because of those http://www.rjamorim.com/test/multiformat128/results.html results at 128kbps.
HotshotGG
QUOTE
If wma would be left outside, half of wannabee slashdotters would be asking where is wma, the cd quality at 64 kbps codec. Were the HA Lame and mpc lovers afraid of it?


laugh.gif I don't know what to make of that site so I don't ask. Informative yes. Always on par with issues at hand no. Rating system nah biggrin.gif For a site full of edcuated nerds they sure don't act like it sometimes tongue.gif.

QUOTE
IMO this sample could be very useful for this test. In particular he cause a lot of phasing problems with lame --preset cbr 128 and Vorbis low bitrates.


channel coupling related maybe? hmm I will have to test that out myself after.

QUOTE
Yeah, misinterpretation of results is quite a problem. I've seen countless people claim vorbis to be always better than any other codec, in all cases, because of those tests


Well it would be great two see both Nero HE-AAC and Vorbis tied for first ;-D. A streaming listening test was definitely going to be needed though eventually.

kurtnoise
QUOTE(rjamorim @ Mar 23 2005, 01:53 AM)
QUOTE(kurtnoise @ Mar 22 2005, 08:19 PM)
The HE-AAC encoder from Helix Producer hasn't been improve since last 64kbps listening tests.
*



It wasn't even tested back then :B
*


ouups, sorry for the confusion... ohmy.gif I was tired last night.



So, He-AAC encoder from Producer could be interesting though... biggrin.gif
Sebastian Mares
QUOTE(Gabriel @ Mar 22 2005, 08:01 PM)
Sample proposition: the beginning of "Money" by Pink Floyd.

I do not have it available, but I am sure some Pink Floyd fan could upload it.
Basically it is background music with coins and cash machine sounds. I think that the coins coud be interesting.
*



http://www.hydrogenaudio.org/forums/index....showtopic=32628

Anyone interested in Time? tongue.gif
Latexxx
Why not taking also 3gpp's CT HE + PS AAC? wink.gif
Sebastian Mares
QUOTE(Latexxx @ Mar 23 2005, 01:18 PM)
Why not taking also 3gpp's CT HE + PS AAC? wink.gif
*



Wouldn't that be a HE-AAC 64 kbps listening test then? rolleyes.gif
Gabriel
All those propositions are transforming this into a 64kbps MPEG test...
Sebastian Mares
Anyways...

Regarding Vorbis, I would love if some Vorbis users could start a small listening test and compare AoTuV3 and Xiph 1.1 so that the better version will be used in this test.

As for the discussion about mp3PRO or ATRAC3+, I think that I will use ATRAC3+ since it is more wide-spread than mp3PRO and since mp3PRO didn't change since the last test.

Still not sure what to do with WMA - either Standard or Professional. I, personally, would choose Standard since it's the format you find in music stores and it's also what most people use so it's compatible with their players.
moi
I think you really need to include WMA Standard, as it is probably the most common format encoded at 64kbps, people will want to see how it compares with others in the test. Of course the newest, WMA 9.1, which is installed with WMP10.

You might also wish to include WMA Pro, to see how it compares at that bit rate to standard.

I really don't see why LAME at 128kbps should be included in a 64kbps listening test, as it was the other time. Probably has something to do with the claim that WMA at 64kbps sounds "as good as" MP3 at 128kbps. I don't think many here believe that claim. In any case, IMO, a 64kbps listening test should only include music encoded at 64kbps. It is misleading to encode 128kbps in one format, and 64kbps in all the others. Everything in a 64kbps listening test should be encoded at 64kbps.

I think MP3Pro should be included, as it did very well on some 64kbps tests in the past. Not supported by many players, but by some, I think it should be included, whether or not it changed.
Redmond
QUOTE(moi @ Mar 23 2005, 05:58 AM)
You might also wish to include WMA Pro, to see how it compares at that bit rate to standard.
*



"Publicly available" WMA Pro encoders do not go down to 64Kb/s stereo.
Latexxx
QUOTE(moi @ Mar 23 2005, 03:58 PM)
I really don't see why LAME at 128kbps should be  included in a 64kbps listening test, as it was the other time. Probably has something to do with the claim that WMA at 64kbps sounds "as good as" MP3 at 128kbps. I don't think many here believe that claim. In any case, IMO, a 64kbps listening test should only include music encoded at 64kbps. It is misleading to encode 128kbps in one format, and 64kbps in all the others. Everything in a 64kbps listening test should be encoded at 64kbps.
*


A credible listening test should have a low and high anchor.
Aoyumi
QUOTE(Sebastian Mares @ Mar 23 2005, 10:33 PM)
Regarding Vorbis, I would love if some Vorbis users could start a small listening test and compare AoTuV3 and Xiph 1.1 so that the better version will be used in this test.
*


When the test is performed, I need to submit the newest experiment version.
It is more clearly than aoTuV beta3 good with some samples (setting to the low bit rate).
PoisonDan
QUOTE(Aoyumi @ Mar 23 2005, 04:32 PM)
QUOTE(Sebastian Mares @ Mar 23 2005, 10:33 PM)
Regarding Vorbis, I would love if some Vorbis users could start a small listening test and compare AoTuV3 and Xiph 1.1 so that the better version will be used in this test.
*


When the test is performed, I need to submit the newest experiment version.
It is more clearly than aoTuV beta3 good with some samples (setting to the low bit rate).
*


Were you planning on releasing a new version soon anyway? I wouldn't want you to feel rushed to get a version out the door just to be in time for this listening test...

At this moment, I'm extremely busy with real-life and work-related stuff, but next week I'll probably have some time to do a few Vorbis listening tests...

Sebastian Mares
Well, take your time, since the test will start after Apple releases their HE-AAC encoder. smile.gif
rjamorim
QUOTE(Redmond @ Mar 23 2005, 11:14 AM)
"Publicly available" WMA Pro encoders do not go down to 64Kb/s stereo.
*



If I remember correctly, the publicly available encoder stays around 64kbps if you choose the lowest VBR setting (10).
Sebastian Mares
Regarding the low anchor, do you think LAME or FhG should be used at 64 kbps?
sehested
QUOTE(Sebastian Mares @ Mar 23 2005, 02:15 PM)
Regarding the low anchor, do you think LAME or FhG should be used at 64 kbps?
*


I would like to see LAME as low anchor.

That would also demonstrate the improvements of the other codecs compared to the best MP3 encoder available.
ff123
QUOTE(sehested @ Mar 23 2005, 02:57 PM)
QUOTE(Sebastian Mares @ Mar 23 2005, 02:15 PM)
Regarding the low anchor, do you think LAME or FhG should be used at 64 kbps?
*


I would like to see LAME as low anchor.

That would also demonstrate the improvements of the other codecs compared to the best MP3 encoder available.
*



This presumes that lame is the best mp3 encoder at 64 kbps, which isn't a given. The question of which mp3 encoder to use as a low anchor probably deserves a pre-test if people are interested in using the best-sounding one.

ff123
guruboolez
Two suggestions:

• I think that the current scale isn't really really suited to a 64 kbps and the expected distortions.
Artifacts “perceptible but not annoying” (4.0) are maybe not very common at this bitrate. And few encoders are able to reproduce (in my opinion) a sound with only "slightly annoying" (3.0) difference at 64 kbps. It's possible to change the corresponding scale with schnofler's abc/hr, and I wonder if it's not worth to think about it. If I remember correctly, the average notation I gave to most encoders during the 32 kbps was inferior to 1.5/5 unsure.gif


• A also suggest to reduce the length of all samples. I'm the first one to provide 30 seconds samples, but I perfectly know the drawbacks. Some people will rate one encoder on a short range located at the beginning, some other will evaluate another part (totally different from the first one), etc... Finally it's exactly if people have evaluate different samples. I suggest to limit the duration to 6 or 7 seconds.
Sebastian Mares
QUOTE(guruboolez @ Mar 24 2005, 08:50 AM)
• A also suggest to reduce the length of all samples. I'm the first one to provide 30 seconds samples, but I perfectly know the drawbacks. Some people will rate one encoder on a short range located at the beginning, some other will evaluate another part (totally different from the first one), etc... Finally it's exactly if people have evaluate different samples. I suggest to limit the duration to 6 or 7 seconds.
*



I understand what you mean, but why not let testers decide which portion they want to ABX?
Gabriel
I also think that 30s might be too long.
Perhaps 6s is too short, but I think that 15s should be enough.

Letting testers deciding which portion to use is perhaps reducing "usefullness" of results. It is like they are testing different samples, but it makes correlation between results for the same sample harder.

If a sample has some quite different parts in a 30s set, then it could be intersting to split it into 2 samples, making interpretation of results easier.
guruboolez
There's nothing wrong with that.
Now imagine a heterogeneous sample: beginning (first seconds) is quiet, whereas the following part is very different.
Suppose that most people will only rate the file on a small part (4-5 seconds). Suppose then that most people will favour the first thing they hear (beginning). Most, but not all... HE-AAC is very good on the beginning, but fail on the second part. Will the overall notation be representative? Isn't it better to provide to all people a short sample only?

If people will evaluate different part of one sample, it could be considered as evaluating two (or more) different samples (at least if a long sample propose some variety). But correct me if I'm wrong, the purpose of a collective test is to obtain results from different subjectivity evaluating the same thing (same sample, same listening material). We can't do that: people don't have the same hardware. But we can at least make one thing, and be sure that all people are listening to the same musical informations.


Is it clear?


EDIT: Gabriel was faster, and explained it better laugh.gif
Aoyumi
QUOTE(PoisonDan @ Mar 23 2005, 11:51 PM)
QUOTE(Aoyumi @ Mar 23 2005, 04:32 PM)
QUOTE(Sebastian Mares @ Mar 23 2005, 10:33 PM)
Regarding Vorbis, I would love if some Vorbis users could start a small listening test and compare AoTuV3 and Xiph 1.1 so that the better version will be used in this test.
*


When the test is performed, I need to submit the newest experiment version.
It is more clearly than aoTuV beta3 good with some samples (setting to the low bit rate).
*


Were you planning on releasing a new version soon anyway? I wouldn't want you to feel rushed to get a version out the door just to be in time for this listening test...

At this moment, I'm extremely busy with real-life and work-related stuff, but next week I'll probably have some time to do a few Vorbis listening tests...
*


I am able to exhibit the version corresponding to the range of 64kbps at least.
I want it to be tested.
Music Mixer
One vote for atrac3+

I suggest to encode via SS3, because it seems to have improved.
I would upload some samples, but it is not possible, because i have only a 56 kbit connection.

(unfortunality)

P.S.: IMHO it sounds better than mp3 but worse than vorbis and he-aac at 64 kbit.
guruboolez
Another suggestion (related to the sample): instead of focusing too much on musical genre (metal - jazz - classical ...), I think it would be better to choose sample for the kind of signal they represent: loud - quiet - noisy - tonal - attacks...

When I sent to Roberto the very quiet sample called Debussy.wav, which had apparently nothing hard to encode, most people were at the end surprised by the poor performance of the champion (musepack). This sample revealed severe issues with musepack (even wma & atrac3 were better) at moderate bitrate. I know that some lossy encoders have serious problems with very tonal music (-> ringing); some other suffers with low volume content. There's also pre-echo...


If you're interested, I could propose several samples.
Gabriel
One thing I'd like is to let encoders adapt to the content before the test position.
Most encoders have adaptative thresholds, and so need a few time to adapt at the beginning. It means that a specific piece would not be encoded the same if it is at the beginning of the track or in the middle.
I think that a 1s delay should be reasonable enough.

So would it be possible to:

*cut the first second of the decompressed sample?

or

*instruct the ABC/HR software to only allow testing past the first second?
ff123
I'm not sure if abchr-java can force the following options, but I'm sure schnofler can modify his code:

1) the rating scale description should be changed to the "excellent" to "poor" labels; I already know this option exists, but it should be forced from the configuration file

2) the start time should be forced to X sec into the clip without allowing the listener to hear anything before that time, also specified from the configuration file.

ff123
moi
QUOTE(Latexxx @ Mar 23 2005, 06:22 AM)
QUOTE(moi @ Mar 23 2005, 03:58 PM)
I really don't see why LAME at 128kbps should be  included in a 64kbps listening test, as it was the other time. Probably has something to do with the claim that WMA at 64kbps sounds "as good as" MP3 at 128kbps. I don't think many here believe that claim. In any case, IMO, a 64kbps listening test should only include music encoded at 64kbps. It is misleading to encode 128kbps in one format, and 64kbps in all the others. Everything in a 64kbps listening test should be encoded at 64kbps.
*


A credible listening test should have a low and high anchor.
*



What does that mean, a high and low anchor? I guess I really don't know what that means--it just seems strange, that for a 64kbps listening test, one of the formats would be tested at 128 kbps rather than at 64 kbps.

If it is to have a reference to compare to, then why not have one sample uncompressed, for listeners to compare the compressed versions with? (Perhaps that's already done. That makes sense, but I don't understand the " high and low anchor", I guess. Please explain.

Does "high anchor" always mean one format is tested at a higher bit rate than the others? For low anchor a lower bit rate? Will you test one format at 32kbps for the "low anchor"?

In the 128kbps listening test, was one of the formats tested at 192kbps for the "high anchor"?
beto
high anchor -> performs noticeably better than the codecs average being tested
low anchor -> performs noticeably worse than the codecs average being tested

afaik this is done to get meaningful statistic results. The high/low anchors are not part of the test itself in the sense that they are not evaluated. They are just a reference...

someone correct me if i am wrong.
Latexxx
The purpose of anchors is to bind the results to real world i.e. when you have an anchor your results won't anymore "float" in the air. When you have anchors, you can compare codecs which are featured in different listening test to each other to some extent.
schnofler
QUOTE(ff123 @ Mar 24 2005, 06:41 AM)
1) the rating scale description should be changed to the "excellent" to "poor" labels; I already know this option exists, but it should be forced from the configuration file
*


I'm not sure what exactly you mean by "forced from the configuration file". The custom rating labels can be specified in the test setup dialog and will be saved to the configuration file.

QUOTE(ff123 @ Mar 24 2005, 06:41 AM)
2) the start time should be forced to X sec into the clip without allowing the listener to hear anything before that time, also specified from the configuration file.
*


The offset setting could be used for this. Just adding 1000*X to each of the offsets will have exactly that effect.
ff123
QUOTE(schnofler @ Mar 24 2005, 10:26 AM)
QUOTE(ff123 @ Mar 24 2005, 06:41 AM)
1) the rating scale description should be changed to the "excellent" to "poor" labels; I already know this option exists, but it should be forced from the configuration file
*


I'm not sure what exactly you mean by "forced from the configuration file". The custom rating labels can be specified in the test setup dialog and will be saved to the configuration file.

QUOTE(ff123 @ Mar 24 2005, 06:41 AM)
2) the start time should be forced to X sec into the clip without allowing the listener to hear anything before that time, also specified from the configuration file.
*


The offset setting could be used for this. Just adding 1000*X to each of the offsets will have exactly that effect.
*



What I meant is that Sebastian should be able to create a configuration file that everyone uses, and which will control the rating labels. Doh, forgot about those offsets in the config file! That's the easy solution, of course.

ff123
jaybeee
QUOTE(Gabriel @ Mar 24 2005, 12:30 PM)
I also think that 30s might be too long.
Perhaps 6s is too short, but I think that 15s should be enough.

Letting testers deciding which portion to use is perhaps reducing "usefullness" of results. It is like they are testing different samples, but it makes correlation between results for the same sample harder.

If a sample has some quite different parts in a 30s set, then it could be intersting to split it into 2 samples, making interpretation of results easier.
*



I've just uploaded an 18sec track here that I feel would be good for this test. I deliberated over which section to use and also how long that section was to be - the song is 21min long and has a lot of demanding parts. I think I chose the best part.
Sebastian Mares
Samples should be posted here, please: http://www.hydrogenaudio.org/forums/index....showtopic=32689
schnofler
QUOTE(ff123 @ Mar 24 2005, 12:29 PM)
What I meant is that Sebastian should be able to create a configuration file that everyone uses, and which will control the rating labels.
*


Yes, that is possible.

QUOTE(ff123 @ Mar 24 2005, 12:29 PM)
Doh, forgot about those offsets in the config file!  That's the easy solution, of course.
*


Heh. Yes, I was just about to get to work on the "new" feature myself, when I noticed it's not such a new feature, really. tongue.gif
Sebastian Mares
So far, the settings used will be:

Nero HE-AAC: VBR profile "Streaming :: Medium", High Quality
Vorbis: -q 0
WMA Standard: -a_codec WMA9STD -a_mode 3 -a_setting 64_44_2
LAME 3.96.1 (high anchor): -V5 --athaa-sensitivity 1

ATRAC3+ samples will be encoded using whatever settings produce 64kbps, same applies to Apple HE-AAC.

Regarding the low anchor, I would use Adobe Audition 1.5 and the FhG encoder at 64 kbps CBR, but others might want to use LAME.
Gabriel
For the high anchor, I would prefer Lame 3.97 (probably in abr setting) that will probably be at least in beta stage when the test should start.
Sebastian Mares
QUOTE(Gabriel @ Mar 25 2005, 04:28 PM)
For the high anchor, I would prefer Lame 3.97 (probably in abr setting) that will probably be at least in beta stage when the test should start.
*



So, --preset 128 then?
Jojo
QUOTE(Gabriel @ Mar 25 2005, 07:28 AM)
For the high anchor, I would prefer Lame 3.97 (probably in abr setting) that will probably be at least in beta stage when the test should start.
*


just out of curiosity are you saying that some ABR preset in the new LAME 3.97 built might be better than -V5 --athaa-sensitivity 1 ?
Sebastian Mares
So, the list of codecs is now pretty much done:

Apple HE-AAC
Nero HE-AAC
WMA Standard
ATRAC3+
LAME 3.97 MP3 (high anchor)
Adobe Audition FhG MP3 (low anchor)
Ogg Vorbis (AoTuV3 or 1.1)

At this point, I would like to ask people again to test between the two Vorbis encoders. If you have time, you can also give Archer a try, but the test should focus on AoTuV3 and 1.1.
westgroveg
What about MP3+? would be interesting to see if the HE-AAC encoders can perform better MP3+ yet
Sebastian Mares
QUOTE(westgroveg @ Mar 27 2005, 09:42 AM)
What about MP3+? would be interesting to see if the HE-AAC encoders can perform better MP3+ yet
*



I suppose you mean mp3PRO... Well, it was tested last time and it performed quite well, but only came third after Nero HE-AAC and the high anchor LAME at 128 kbps.

user posted image

I will not include it in this test because there are no improvements since the last test and also because it is a pretty rare format with little soft- and hardware support.
vinnie97
There's a AoTuV-prebeta4 to check now which supposedly resolves some issues @ q0. wink.gif http://www.geocities.jp/aoyoume/aotuv/test.html
guruboolez
I did a small listening test for WMA9 encoders. As samples, I've used all selected by Roberto for his last 128 kbps Multiformat Listening Test.


Two important things:

• I didn't browse HA since last thursday (If decisions were made in this topic since one week, I wasn't aware)
• this listening test was a very fast one. Too fast I would say. I didn't ABX anything; and I've probably miss some details.

user posted image

• WMA9Pro is better, but bitrate doesn't tend to 64 kbps at -q10. WMA9Pro is nevertheless not that better.
• Statistically, CBR 2 pass and VBR 2 pass are tied, but CBR 64kbps 2-pass appeared to be a bit more constant in quality than VBR at low bitrate.


EDIT: blank log files (no comment, simple notation) are available here.
Sebastian Mares
Thanks for the test guruboolez! Weird that VBR is a bit worse than CBR - didn't expect that. I guess I will use CBR for WMA standard then.
guruboolez
As I said, I was not fully satisfied by this test (too fast, too imprecise). If the collective test doesn't start in the next days, I think I could test CBR and VBR again, without WMApro this time, and with ABX phase in order to be sure that difference were audible.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.