Multiformat 48 kbps Listening Test, Pre-Test Discussion, Take 2 - And Action! |
![]() ![]() |
Multiformat 48 kbps Listening Test, Pre-Test Discussion, Take 2 - And Action! |
Nov 5 2006, 18:20
Post
#126
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
Well, if differences are only marginal and don't affect quality, I guess we should go with WMCmd.vbs because it can be used for batch encoding. WMP does not encode to Q10 WMA Standard. I could also use Winamp, but I just uninstalled it again.
-------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 5 2006, 23:10
Post
#127
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
I was talking with Roberto about the problems of testing WMA 2-pass VBR the other day and was wondering about one thing - is only 2-pass VBR affected by the issue I described here or does this affect all VBR modes actually. Therefore, I asked both Ivan and Gabriel how their VBR implementations work and whether or not it is true that "free" VBR will always allocate the same number of bits to a given sample, regardless of the fact that it's part of a full song or the sample was encoded as-is: as an already extracted part of a track. While Ivan confirmed my initial thoughts, Nero producing two more or less identical encodes, Gabriel said this is not the case with LAME. He explained that LAME is using a variable ATH level whichs value is based on the previous loudness. Therefore, encoding a full track is not the same as encoding a sample - even if VBR was used, the sample encoded as-is will not be the same as the sample encoded from the whole track.
I am now wondering how big the effect is. Does this "news" render all previous listening tests based on samples as useless with regards to LAME? This post has been edited by Sebastian Mares: Nov 5 2006, 23:12 -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 5 2006, 23:34
Post
#128
|
|
![]() Group: Members (Donating) Posts: 3474 Joined: 7-November 01 From: Strasbourg (France) Member No.: 420 |
It's for that reason Gabriel suggested 2 years ago (and sometimes recalled it) that testers should discard the first one or two seconds from the tested files.
And if I remember correctly it was done for the last listening tests (an option allows this in ABC/HR). It needs to be confirmed by Gabriel anyway. This post has been edited by guruboolez: Nov 5 2006, 23:37 |
|
|
|
Nov 5 2006, 23:38
Post
#129
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
OK, so it's not something that affects the whole encode, but only the first few samples.
-------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 6 2006, 10:34
Post
#130
|
|
|
Nero MPEG4 developer Group: Developer Posts: 1466 Joined: 22-September 01 Member No.: 8 |
@Sebastian,
I would not treat LAME variable ATH as such a problem for the listening test. Fact is that many psychoacoustic models take into account the previous samples - and it is not just variable ATH. For example, there is temporal post-masking phenomenon - which would create different bit distributions for a given sample, based on the loudness of the samples in the past - however, this phenomenon is very local in time - e.g. maximum duration is approx. 200 ms (unless encoder is buggy) Also, some encoders are using time-domain methods to estimate tonality of the signal - for example, if the masker is behaving unpredictable in the time domain in the past, encoder might judge the masker as being "noisy" - and this can mean up to 20+ dB in the masker power difference. Additionally, in SBR you might get slightly different results as there is usually small "SBR Reset" flag being sent every second or so (depending on the encoder) - the difference between two encodings of the same sample, but located in the different region is also not big, but it is definitely there. Etc.. These are just a few factors that might render samples encoded with different quantization resolution depending on the past samples. However, all of these differences IMHO are not so relevant for a listening test. I think just adding 2-3 seconds of "run-in" is more than enough to make a fair test. |
|
|
|
Nov 6 2006, 11:24
Post
#131
|
|
![]() Group: Members Posts: 1303 Joined: 14-September 05 From: Helsinki, Finland Member No.: 24472 |
QUOTE I think just adding 2-3 seconds of "run-in" is more than enough to make a fair test. Isn't cutting the first two seconds off in the ABC/HR options the usual practice? However, this may be a problem with very short samples or samples that start with audio signal that is meant to demostrate a specific problem. Here is an example of such a sample: http://www.hydrogenaudio.org/forums/index....st&p=420360 The first two or three seconds seem to be problematic for all MP3 encoders at about 128 kbps. The sample is also from the very beginning of a real audio track so it is not artificial. Perhaps a few seconds of some PCM material could be addded before the sample, but should this be digital silence or some average audio material? Would a few seconds of silence make the encoder behave differently when the real sample starts? If the sudden signal change alters the encoding result we would need to know what is the encoder "default" before it starts adjusting its parameters and use an audio signal that would not change this default if possible. Edit Naturally it is possible to decode the sample and add an audio signal after that. The only downside would be the larger file size of the lossless test sample. This post has been edited by Alex B: Nov 6 2006, 11:34 -------------------- http://listening-tests.freetzi.com
|
|
|
|
Nov 6 2006, 11:40
Post
#132
|
|
![]() LAME developer Group: Developer Posts: 2950 Joined: 1-October 01 From: Nanterre, France Member No.: 138 |
Most modern audio and video encoders will produce different results based on previous samples. It might be because of detection methods (predictability, ATH level,...) or because the encoder is "learning" (mostly video encoders).
In both cases, discarding a few seconds at start (those discarded data beeing similar to tested range - ie no "scene cut") are enough to compensate for this behaviour. In the Java ABC/HR, up to now, we have to trick it by adjusting the "sample delay" by 2 seconds. (would be nice to be able to specify a testing range instead of this hack) |
|
|
|
Nov 6 2006, 12:06
Post
#133
|
|
![]() Group: Members Posts: 1303 Joined: 14-September 05 From: Helsinki, Finland Member No.: 24472 |
So the correct approach for my example sample would be to encode it as it is (since it is from the beginning of a real audio track), decode it and add at least two seconds of digital silence in the beginning.
If some other "too short" sample is from the middle of the audio track, a longer passage of the same track should be encoded. At least it should start more than two seconds before the intended sample starting point.* Edit: *If preferred, this type of encoded sample can be cutted to the intended length after decoding. In this case at least two seconds of silence must be added in the beginning if the sample is going to be used with the two second Java ABC/HR delay setting. This post has been edited by Alex B: Nov 6 2006, 12:27 -------------------- http://listening-tests.freetzi.com
|
|
|
|
Nov 6 2006, 12:28
Post
#134
|
|
![]() LAME developer Group: Developer Posts: 2950 Joined: 1-October 01 From: Nanterre, France Member No.: 138 |
So the correct approach for my example sample would be to encode it as it is (since it is from the beginning of a real audio track), decode it and add at least two seconds of digital silence in the beginning. No. You encode it as it is, and do not test the first 2 seconds or You add two seconds of something at the beginning, encode it, and do not test the first 2 seconds. (first solution is highly preferable) |
|
|
|
Nov 6 2006, 12:38
Post
#135
|
|
![]() Group: Members Posts: 1303 Joined: 14-September 05 From: Helsinki, Finland Member No.: 24472 |
So the correct approach for my example sample would be to encode it as it is (since it is from the beginning of a real audio track), decode it and add at least two seconds of digital silence in the beginning. No. You encode it as it is, and do not test the first 2 seconds or You add two seconds of something at the beginning, encode it, and do not test the first 2 seconds. (first solution is highly preferable) The example sample demonstrates a problem in the first few seconds of the track. It represents a real life situation. Just try for example the L3enc version I uploaded. The guitar chords in the very beginning are very bad. I am not removing the first two seconds when I listen to this track outside a listening test. -------------------- http://listening-tests.freetzi.com
|
|
|
|
Nov 6 2006, 14:19
Post
#136
|
|
![]() Group: Members Posts: 1303 Joined: 14-September 05 From: Helsinki, Finland Member No.: 24472 |
Out of curiosity, I tried the first three seconds of this AC/DC sample with aoTuV b5 @ -q-1, Nero AAC @ ABR 48kbps and l3enc MP3 @ 128 kbps.
Foobar ABX result was 10/10 for all three when compared with the reference. In my opinion Vorbis and l3enc produced unusable quality. Nero AAC was much better, I would say "slightly annoying". Edit: I used "-br 48000" with Nero Digital cl encoder v. 1.0.0.2. This post has been edited by Alex B: Nov 6 2006, 14:47 -------------------- http://listening-tests.freetzi.com
|
|
|
|
Nov 6 2006, 16:46
Post
#137
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
Ivan, do you still recommend ABR or is it OK if VBR used?
Does anyone mind the following settings: Ogg Vorbis AoTuV AO; aoTuV b5 [20061024] (based on Xiph.Org's libVorbis): q-1.0 Nero HE-AAC Nero AAC codec / May 1 2006: VBR, Q0.20 WMA Standard Windows Media Audio 9.2: VBR Quality 10, 44 kHz, stereo 1-pass VBR WMA Professional Windows Media Audio 10 Professional: 48 kbps, 44 kHz, 2 channel 16 bit 1-pass CBR The settings were chosen so that all encoders reach more or less the same bitrate with my material. Bitrate tables are welcome. Edit: WMA Professional will reach 48 kbps with all material because it encodes with CBR. The other encoders produce ~50 kbps. If developers and majority of the community agrees with this, I suggest we should start discussing samples. Should we use some samples from the HE-AAC test? I also have some files I would like to post (in case I didn't already), like a Vangelis and a Uriah Heep one. This post has been edited by Sebastian Mares: Nov 19 2006, 19:12 -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 6 2006, 17:08
Post
#138
|
|
|
Nero MPEG4 developer Group: Developer Posts: 1466 Joined: 22-September 01 Member No.: 8 |
QUOTE Ivan, do you still recommend ABR or is it OK if VBR used? I'm fine with both - ABR should provide less quality deviation, but VBR should score a bit higher on average. Up to you guys. |
|
|
|
Nov 6 2006, 17:36
Post
#139
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
Sorry, but I am afraid I did not understand. What do you mean with "ABR should provide less quality deviation"?
-------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 6 2006, 18:13
Post
#140
|
|
|
Nero MPEG4 developer Group: Developer Posts: 1466 Joined: 22-September 01 Member No.: 8 |
I meant - ABR quality (subjective grade) is more consistent, with "shorter" confidence intervals than VBR at that bitrate.
This is because VBR mode could undercode some samples and they would sound slightly less good than when they are coded with ABR mode. However at average VBR is indeed a bit better. This post has been edited by Ivan Dimkovic: Nov 6 2006, 18:17 |
|
|
|
Nov 6 2006, 18:35
Post
#141
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
You encode it as it is, and do not test the first 2 seconds or You add two seconds of something at the beginning, encode it, and do not test the first 2 seconds. (first solution is highly preferable) Gabriel, but what if a song doesn't start "fading in" but like Alex B pointed out with the AC/DC sample? -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 6 2006, 18:45
Post
#142
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
How many samples should we use, 12?
-------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 7 2006, 00:13
Post
#143
|
|
|
Winamp Developer Group: Developer Posts: 662 Joined: 17-July 05 From: Ashburn, VA Member No.: 23375 |
|
|
|
|
Nov 7 2006, 00:18
Post
#144
|
|
|
Group: Members Posts: 1315 Joined: 3-January 05 From: Argentina, Bs As Member No.: 18803 |
|
|
|
|
Nov 7 2006, 09:37
Post
#145
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
Well, I think 18 samples is maximum.
-------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 7 2006, 10:24
Post
#146
|
|
![]() LAME developer Group: Developer Posts: 2950 Joined: 1-October 01 From: Nanterre, France Member No.: 138 |
Sorry, but I am afraid I did not understand. What do you mean with "ABR should provide less quality deviation"? What Ivan is telling is that he's not totally confident in his VBR mode ;-) Full VBR is a matter of trusting your psymodel, which most of the time is not perfect. If your codec is efficient enough compared to competitors, it's usually safer to rely on ABR (ie VBR is not worth the risk if you are good enough). (now you know why iTunes is ABR and not fully VBR, and why it is recommended to use Lame in VBR) Gabriel, but what if a song doesn't start "fading in" but like Alex B pointed out with the AC/DC sample? If you really want to test the start of your sample, you would have two choices: *re-rip the samples with 2 extra seconds at the beginning *add 2 seconds of silence at the start of the sample |
|
|
|
Nov 7 2006, 12:46
Post
#147
|
|
|
Nero MPEG4 developer Group: Developer Posts: 1466 Joined: 22-September 01 Member No.: 8 |
QUOTE Full VBR is a matter of trusting your psymodel, which most of the time is not perfect. If your codec is efficient enough compared to competitors, it's usually safer to rely on ABR (ie VBR is not worth the risk if you are good enough). Actually, Looking here: http://www.hydrogenaudio.org/forums/index....showtopic=41191 It looked like Nero VBR @48 kbits/s was just a bit better than ABR. However, at such a low bit-rate I don't believe there are big benefits of using true VBR - there is not too much space to scale the bit-rate down before sound start to degrade a lot - which means that there won't be space to scale it up, either - in case of need. So, ABR should do just fine. |
|
|
|
Nov 7 2006, 13:52
Post
#148
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
OK, ABR for Nero then. If everything else is fine, we should focus on samples now.
This post has been edited by Sebastian Mares: Nov 9 2006, 07:23 -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 7 2006, 14:18
Post
#149
|
|
![]() Group: Members Posts: 1303 Joined: 14-September 05 From: Helsinki, Finland Member No.: 24472 |
We should remember that in the test we should use a setting that should produce the best average quality with various complete audio tracks. So if Ivan recommends ABR to users who are going to encode a complete audio library at about 48 kbps then it should be used.
If the recommendation is VBR then it should be tested even if a certain set of selected test samples would possibly result a bit better quality in ABR mode... * Edit * ... or when the ABR mode would be a safer choice for winning this particular test, like Gabriel explained. This post has been edited by Alex B: Nov 7 2006, 14:28 -------------------- http://listening-tests.freetzi.com
|
|
|
|
Nov 12 2006, 13:06
Post
#150
|
|
![]() Group: Members Posts: 1303 Joined: 14-September 05 From: Helsinki, Finland Member No.: 24472 |
Here's a bitrate table and graph in Excel format. I used my usual set of 25 various full length tracks:
bitrates_48kbps_test.xls Average bitrates: Nero Digital 1.0.0.2 -br 48000 => 48 kbps Nero Digital 1.0.0.2 -q 0.21 => 50 kbps Nero Digital 1.0.0.2 -q 0.20 => 48 kbps WMA 10 Pro CBR 48 kbps => 48 kbps WMA 9.2 standard VBR10 => 47 kbps Vorbis aoTuV beta 5 -q -1 => 49 kbps Some of you may find the following screenshot interesting too. Some track peaks of my test file set, starting from the highest peak: ![]() Any comments? EDIT I tested Nero -q 0.2 and changed Nero -q 0.205 to -q 0.2 since it is the selected test option (it was: Nero -q 0.205 => 49 kbps). Also the linked Excel file is updated. This post has been edited by Alex B: Nov 22 2006, 18:42 -------------------- http://listening-tests.freetzi.com
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 23rd May 2013 - 19:58 |