Help - Search - Members - Calendar
Full Version: Multiformat@128kbps listening test - FINISHED
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
Pages: 1, 2, 3, 4, 5, 6
echo
QUOTE(phong @ May 24 2004, 03:53 PM)
Ooops, should be fixed now.

Yes it is. Thanks. smile.gif
echo
Nope. It just won't work for me. All I get from chunky is
CODE
Parsing result files...
Traceback (most recent call last):
 File "chunky", line 639, in ?
 File "chunky", line 595, in main
 File "abchr_parser.pyc", line 634, in __init__
 File "abchr_parser.pyc", line 646, in _handleTargets
 File "abchr_parser.pyc", line 697, in __init__
abchr_parser.Error: Sample directory names must end in a number.
But the directory names already end in a number! sad.gif
rjamorim
QUOTE(echo @ May 24 2004, 09:27 PM)
But the directory names already end in a number!  sad.gif

The folder where you run chunky from (and where the SampleXX folders are) must be empty

I.E, no files there, only the 18 folders.
echo
OK the program executed but all I got were files with:
QUOTE
%
% !EMPTY!:

Vorbis MPC Lame iTunes Atrac3 WMA

% Codec averages:
% 0.00 0.00 0.00 0.00 0.00 0.00

huh?
Cygnus X1
QUOTE(rjamorim @ May 24 2004, 04:54 PM)
QUOTE(Latexxx @ May 24 2004, 02:17 PM)
Some body has posted the results at http://forums.minidisc.org/viewtopic.php?p=22300 Nobody has dered to answer yet. Maybe all the minidisc guys have got heart attack after reading the results.

http://forums.minidisc.org/viewtopic.php?p=22321#22321

Almost 400 page views and not a peep from anybody. I find this sort of response interesting...it's not like we're personally attacking MD. The test simply showed that it's performance isn't up to par, fair or not. The exact same thing happened when I used to talk about pre-echo samples with ATRAC Type-R. I have to wonder, though, how many readers of that thread will rush out and buy a 1GB "hi-md" machine once they come out...although ATRAC3plus is technically a different animal (much bigger transform window, etc) than ATRAC3, my expectations aren't very high for it either.

(Edit: I kant sphell)
rjamorim
QUOTE(echo @ May 24 2004, 09:37 PM)
OK the program executed but all I got were files with:

Oops.

Last detail: files extension must be .txt :B

So please rename all xmls to txt (ren /s *.xml *.txt)
echo
QUOTE(rjamorim @ May 24 2004, 04:57 PM)
Oops.

Last detail: files extension must be .txt :B

So please rename all xmls to txt (ren /s *.xml *.txt)

Got it! Finaly! biggrin.gif Thanks a lot Roberto. Here are some character graphs with confidence intervals included. I will make proper graphs tomorrow after I get some sleep if you like (it's already 5:00 in the morning here! tongue.gif )

Bartok (p=0.851)
CODE
Level       N      Mean     StDev  -----+---------+---------+---------+-
Atrac3     16    4.3125    1.2894   (-----------*-----------)
iTunes     16    4.6438    0.9040           (-----------*-----------)
Lame       16    4.4688    0.8845      (------------*-----------)
MPC        16    4.5438    0.9736        (------------*-----------)
Vorbis     16    4.7375    0.5439             (-----------*------------)
WMA        16    4.4125    1.1621     (-----------*------------)
                                  -----+---------+---------+---------+-
Pooled StDev =   0.9880               4.00      4.40      4.80      5.20


Leahy (p=0.532)
CODE
Level       N      Mean     StDev  -------+---------+---------+---------
Atrac3     12     3.758     1.672   (---------*--------)
iTunes     12     4.242     1.161          (---------*--------)
Lame       12     4.108     1.157        (---------*--------)
MPC        12     4.408     0.955            (---------*---------)
Vorbis     12     4.683     0.876                (---------*---------)
WMA        12     4.367     1.130            (--------*---------)
                                  -------+---------+---------+---------
Pooled StDev =    1.186                 3.50      4.20      4.90


Mahler (p=0.660)
CODE
Level       N      Mean     StDev  ---------+---------+---------+-------
Atrac3     12     3.617     1.777  (----------*---------)
iTunes     12     4.092     1.328         (---------*----------)
Lame       12     4.167     1.170          (----------*---------)
MPC        12     4.517     0.735               (----------*---------)
Vorbis     12     4.292     1.076            (---------*----------)
WMA        12     4.142     1.323          (---------*----------)
                                  ---------+---------+---------+-------
Pooled StDev =    1.274                   3.50      4.20      4.90


Ordinary world (p=0.846)
CODE
Level       N      Mean     StDev  ------+---------+---------+---------+
Atrac3     13    4.4769    0.7939       (------------*------------)
iTunes     13    4.3846    1.0808     (------------*------------)
Lame       13    4.4538    0.8866      (------------*------------)
MPC        13    4.7077    0.6396             (------------*------------)
Vorbis     13    4.7077    0.7065             (------------*------------)
WMA        13    4.3077    1.3775   (------------*------------)
                                  ------+---------+---------+---------+
Pooled StDev =   0.9478                4.00      4.40      4.80      5.20
eagleray
Good work Roberto.

The results are an upset win for ogg vorbis, and a significant improvement in the venerable Lame MP3 as well.
kwanbis
good work roberto ... i would continue to use LAME then till iRiver porperly supports Vorbis wink.gif
gkmeyer
Okay, now that it is over, I want to make a few points.

I really couldn't tell the difference between any of these codecs and the wav at this bitrate. The first thing that will come to many of your minds is equipment, but I am using a decent pair of headphones (Grado SR-60's) and although I am working without a headphone amp on a Thinkpad which uses the Intel 855 AC'97 codec, things sound pretty good.

A few times I thought I heard a difference, but when I tried to abx the set I was not successful. At one point I deleted my whole results fileset thinking what I would be submitting wouldn't be acceptable. Although, I reconsidered and figured I would go ahead and submit them with everything scored a five.

I am relatively new to this, and what is interesting to me is that when I use the myriad of training materials that exist, I can successfully hear the artifacts and problems when I am told what to listen for. I successfully abx these samples 100% of the time. However, in a blind test, when I don't know what I am listening for, I cannot hear a difference. I know quality plays a role here, but I am thinking my problem is that I don't have the attention span and attention to detail necessary (which would be consistent with how I react to other stimuli) to be good at this.

I would be very interested in hearing some isolated artifacts on a few of these samples so I can try to hear what I missed. It's been a learning experience anyway, thanks for letting me participate.
Daijoubu
QUOTE(bond @ May 23 2004, 11:50 PM)
woow, now thats what i not expected

- vorbis aotuv: vorbis is back, and i am proud to have helped finding out what vorbis encoder should be used smile.gif
- mpc vs aac: funny that mpc was that better than itunes (with a only 0.15 higher setting than in the last test)
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
- atrac3: even worse than wma9 -> go away sony

and if you take this test as a comparison between some online music stores (itunes vs. wma9 based ones vs. sonys new store) itunes clearly comes out as the winner, leaving wma9 behind by far!

ATRAC3plus is thier new codec wink.gif

I do understand that Sony Connect service currently offer ATRAC3, so stick to other format for those 99 cents purchase tongue.gif

There are replies:
http://forums.minidisc.org/viewtopic.php?p=22345#22345
http://minidisct.com/forum/showthread.php?threadid=22995

Which implementation of ATRAC3 did this test use?
I only see flac decompressing the wav, where does it originate from?
The hardware and the encoder in SonicStage may leads to different output
Halcyon
Daijoubu,

I also had hard time finding artifacts on all samples that I had time to listen to.

However, as in most sensory skills, practise makes perfect, so don't be too worried about your hearing being bad (in an absolute sense).

I've also trained using my own samples, lame/ff123/vorbis/klemm/vqf/MUS420/AES test samples, but clearly I still have a long way to go myself to properly hear even the most obvious of artifacts.

You are also right in assuming that attention plays a part in sensory detection. You wil hear/see/taste/feel more if you know what to "look" for.

You can use the user comments to find problem parts in samples:

http://www.rjamorim.com/test/multiformat12...s/comments.html

I tried to put what goes wrong, how and at what time in the playback of the sample (didn't do this to even half the samples though) for _my hearing_.

Using those comments from various testers, it is possible to guide your attention to listen to for something specific and just repeat a certain part of the sample.

Be noted however that people are sensitive to different artifacts. I went through some of the ff123's/Pio's comments and I didn't pay any attention to some of the stuff he heard and found annoying.

Guess, I'll have to train some more smile.gif

best regards,
halcyon
anishbenji
The test was mentioned on the Screensavers last night. Not much was said beyond mentioning that the winners were Ogg vorbis and Musepack. A link to the results were included in the shownotes.
Patrick said that he didn't know how the tests were conducted (e.g. if they were blind etc), and that he was planning to download the test and try it out himself.
earwax
QUOTE(rjamorim @ May 24 2004, 12:06 AM)
QUOTE(Raptus @ May 24 2004, 05:00 AM)
How many results were discarded because of ranked refs?

54

Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.

Does that mean that all of the users rankings for that sample were thrown out or just for the codec(s) where they ranked the reference?

Was there any pattern in those 54 discarded results as to which codecs' were mis-identified? If, for example, half of those 54 were thrown out because they ranked the reference vs. MPC that would be somewhat interesting.

I guess I'm just looking for some informatin that would make me more comfortable that throwing out those 54 didn't distort results in any obvious way.
Aoyumi
ohmy.gif I did not expect that aoTuV became the first place.
This is delightful miscalculation. biggrin.gif

@Acknowledgement
First, I appreciate very much people which performed the spontaneous comparison test and the tuning friend of Vorbis.

And it is thankful to people of Xiph.org including Monty which created libvorbis(& ogg) which is the code base of aoTuV. Vorbis is a wonderful format!

Finally, it is thankful to all the people concerned with this test.
smile.gif
Halcyon
Three things things perhaps worth considering in the future. I noticed these myself, but I'm not sure others see them as important:

1) ABCHR Java version has, imho, some issues:

- buffer length that is small enough for fast switching can cause a lot of skips/gaps on playback (at least on my system. I think Gabriel may also have mentioned this?)

- Often times, I found out that I was about to save a test result from the software with accidental "rankings". That is, I'd sometimes click accidentally on a sample that I knew was the reference, moving it's slider from 5.0 to 4.9 (this happens for instance when I switch from sound card volume mixer window back to ABCHR and accidentally click on the slider for a sample that I didn't mean to rate). Now, the problem is that this change from 5.0 to 4.9 on the UI itself is so small, that it almost went unnoticed by me on various occasions. So, I was about to save/return results where I had erroneously/unintentionally ranked the reference/original sample (even though I had ABXed the reference from the test sample). This would have lead to discarding of the results (i.e. wasted time for me and loss of data for the test). I wish there was a clear indicator (color or something) that showed when any of the slider had been moved from the reference position (even if only 0.1 points). Just a UI design issue and minor at that, but can lead to discarding of perfectly "good" data.

- It is impossible to select the Output sound card and/or the ouput method (DirectSound/WaveOut/Asio/Kernel). On cards that have broken DirectSound (like RME DIGI 96/8), this makes it hard/useless to use that card. I had to resort to my worse sound card, worse headphone amplifier and worse headphones due to this. Not that it necessarily altered my listening accuracy at all, but it was a bummer not to be able to use gear one is accustomed to. I wonder if there is any way around this limtation?

2) Intro: I've been reading Les Leventhal's AES papers, like "Type 1 and Type 2 errros in the Statistical Analysis of Listening Tests". Mr Leventhal is a psychologist who understands auditory testing and statistical analysis issues on the subject of significance leves (I recommend: J. Audio Eng. Soc, Vol 34, No 6, 1986 June as a starting point. He has further papers on the issue). While statistical analysis is not a substitute for a carefully thought out research methodology and test setup, it can help to analyse non-ideal settings with higher confidence.

Suggestion: Considering Leventhal's points and the impossibility of making a perfect test: most test don't even have a research question openly formulated, not to mention analysis of the testing methodology in reference to the research question, both of which could actually further validate the results AND limit the scope of conclusions which can be draw from the results.

With these in mind I'd suggest considering the use of fairness coefficients in the significance calculation (especially in test that have very small audio impairments and a low likelihood of detectability). Neurological research about diminishing of auditory evoked responses with repeat tests also appear to support this conclusion.

3) For general use, for learning how to listen / train one's hearing and for test likes the last 128 kbps test, could we build some general guidelines on how to conduct listening tests alone. That is, after somebody has offered the samples and the software, how does one actually carry out the listening and ranking, in order to get the best out of it.

This could include issues like volume setting, selecting a good time to test, pros/cons of repeated fast switching, re-inforcement of the neutral reference, attitutional motivation, attention guiding, etc. All these can have a slight and in some cases a dramatic effect on the overall results (not necessarily changing any codec rankings, but enabling testers to find more artifacts). I already know of the fine ff123's pages and they could serve as a starting point. We could inject some basic tips there culled from cognitive, neurological and audiological research. And your personal experience of course.

Unfortunately I'm not much of a person to help with issues 1 & 2 any further, but maybe others can consider them for future alterations, if they feel they are important.

In 3 I could perhaps contribute, if others are interested.

Would this be a good Wiki project? Should we start a new thread to discuss this, if there are any interested parties.


regards,
halcyon
phong
QUOTE(rjamorim @ May 25 2004, 12:57 AM)
Oops.

Last detail: files extension must be .txt :B

So please rename all xmls to txt (ren /s *.xml *.txt)

I think that's my "oops." I thought I made it accept the .arf and .xml extensions... I'll fix that and a couple other little things and upload a new version tonight sometime.
Mac
With issue 1, why not just invoke a dialog box saying "Are you sure you want to change the ranked file for sample x" if you try switching from rating A to B? That way you could click [Oh crap nooooo] and not lose your previous rating.

I would appreciate some advice on testing dos and dont's, basic technique combined with what to look for.
ff123
QUOTE(Halcyon @ May 25 2004, 07:37 AM)
Three things things perhaps worth considering in the future. I noticed these myself, but I'm not sure others see them as important:

1) ABCHR Java version has, imho, some issues:

- buffer length that is small enough for fast switching can cause a lot of skips/gaps on playback (at least on my system. I think Gabriel may also have mentioned this?)

I am slowly working towards implementing encryption in abchr for windows. The first part of that is to be able to decode xml setup files. I'm currently figuring out how to use expat/arabica to implement a document object model for xml (yes, I could have used msxml.dll, but I want something that works for all windows users without having to ask them to install updated dlls).

Hopefully being able to use a native windows app on pc/windows systems should take care of the clicking issue.

QUOTE
- It is impossible to select the Output sound card and/or the ouput method (DirectSound/WaveOut/Asio/Kernel). On cards that have broken DirectSound (like RME DIGI 96/8), this makes it hard/useless to use that card. I had to resort to my worse sound card, worse headphone amplifier and worse headphones due to this. Not that it necessarily altered my listening accuracy at all, but it was a bummer not to be able to use gear one is accustomed to. I wonder if there is any way around this limtation?


In java, you are restricted to the java sound library. In abchr for windows, I only implemented wavOut playback, which is probably the most compatible method for existing PC's (plus it was convenient to use the MCI interface). I don't have plans to implement DirectSound or ASIO playback.

QUOTE
2) Intro: I've been reading Les Leventhal's AES papers, like "Type 1 and Type 2 errros in the Statistical Analysis of Listening Tests". Mr Leventhal is a psychologist who understands auditory testing and statistical analysis issues on the subject of significance leves (I recommend: J. Audio Eng. Soc, Vol 34, No 6, 1986 June as a starting point. He has further papers on the issue). While statistical analysis is not a substitute for a carefully thought out research methodology and test setup, it can help to analyse non-ideal settings with higher confidence.


This sounds interesting. I should note that the method Roberto uses for analyzing the results favors finding differences at the expense of higher type I errors -- i.e., it does not correct for multiple samples.

Currently the biggest remaining criticism I see in Roberto's tests are not statistical. I think the bitrate criticism should be tackled head on in future tests. Bitrates over multiple albums and bitrates over the sample set should be about the same, IMO. So that means choosing samples which might not at first glance appear to be "difficult."

You ask how does the test method affect the test? Well in this case, we have self-selected listeners and an abc/hr test method. The self-selection is probably amplifying the differences. In the general population, I'd bet the vast majority of people would not find the differences this group of listeners has.

The abc/hr and abx test methods are also very sensitive, and certainly not representative of real-world listening. I think it also has a tendency to over-amplify differences (although those differences are very real). Bottom line -- these tests are, if anything, too sensitive to represent everyday listening for the general population.

But for the people who actually care, they do a pretty good job of providing information on differentiating codec quality at a very subtle level.

ff123
earwax
This test was discussed slightly on the MiniDisc TBoard too -
http://www.minidisct.com/forum/showthread....&threadid=22995
rjamorim
QUOTE(Daijoubu @ May 25 2004, 06:08 AM)
Which implementation of ATRAC3 did this test use?

SonicStage2

QUOTE
I only see flac decompressing the wav, where does it originate from?


Decoding the Atrac3 and encoding to FLAC. There's no other way to distribute the Atrac3 samples.

QUOTE
Was there any pattern in those 54 discarded results as to which codecs' were mis-identified? If, for example, half of those 54 were thrown out because they ranked the reference vs. MPC that would be somewhat interesting.


Hrm... you would have to check the output of Chunky with the command line I posted earlier to see what results are being discarded, and then analyze these results one by one.
Toe
I gotta say this has done a lot to restore my confidence in Vorbis, and I'm probably not the only one. Mad props to Aoyumi.

Given that LAME is now nipping at the heels of iTunes at 128k, I do really have to wonder about the AAC encoders that didn't win the last AAC listening test. I wouldn't be surprised if LAME is now ahead of or at least tied with Nero AAC. Who wants to test? biggrin.gif
Canar
QUOTE(DAvenger @ May 24 2004, 11:43 AM)
Vorbis winning (I know, MPC is very close) a listening test?  ohmy.gif  That didn't happen for a long time  laugh.gif

As I said on #foobar2000 to someone saying the same thing: Learn stats, and post again.

I'm surprised this "claim" hasn't been debunked yet. Vorbis did not win. Statistically speaking, it's more likely Vorbis is better than MPC than the other way around, but you cannot say that Vorbis won with any level of certainty. It'd be like 60% probability Vorbis is better, and 40% probability Musepack is better (I pulled these numbers out of my ass for a visual example). I believe this test was run with a significance level of 95%. Am I correct?
rjamorim
QUOTE(Canar @ May 25 2004, 03:05 PM)
I believe this test was run with a significance level of 95%. Am I correct?

Erm... it's in the results page. Read the second sentence of "How to interpret the plots:"

Now, officially they are tied. But considering Vorbis' score is above MPC's confidence margin, I would say, with some confidence, that Vorbis aoTuV is better than MPC, at this bitrate.
JohnV
I'd like to see sometime a double test. Meaning, another test after the first one with another set of samples, and see how close the final results are to each others.
rjamorim
QUOTE(JohnV @ May 25 2004, 03:53 PM)
I'd like to see sometime a double test. Meaning, another test after the first one with another set of samples, and see how close the final results are to each others.

You are invited to conduce it smile.gif
Daijoubu
QUOTE(rjamorim @ May 25 2004, 09:17 AM)
QUOTE(Daijoubu @ May 25 2004, 06:08 AM)
Which implementation of ATRAC3 did this test use?

SonicStage2

QUOTE
I only see flac decompressing the wav, where does it originate from?


Decoding the Atrac3 and encoding to FLAC. There's no other way to distribute the Atrac3 samples.

Real time recording via internal loopback? blink.gif
rjamorim
QUOTE(Daijoubu @ May 25 2004, 04:15 PM)
Real time recording via internal loopback? blink.gif

Total Recorder.
jormartr
It seems people outside HA does not understand the language I speak, they interpret 'yes' as 'no' and viceversa, 'better' as 'worst', 'scientific, objective and repicable by yourself' as 'my mother did it and she owns the truth' blink.gif
Mac
QUOTE(JohnV @ May 25 2004, 06:53 PM)
I'd like to see sometime a double test. Meaning, another test after the first one with another set of samples, and see how close the final results are to each others.

Couldn't you just split the current test into two 9-sample tests and pretend one was taken after the other. Comparing the results of these two 'sub-tests' would in effect be the same, and you've got the benefit of 49'000 different ways to create two sub-tests. I would imagine from the previous comments that you wouldn't find a great amount of discrepancy, I got the impression that gone are the days when WMA is best for classical and mp3 best for metal (or whatever codec/genre associations there were)
SirGrey
QUOTE
It seems people outside HA does not understand the language I speak, they interpret 'yes' as 'no' and viceversa, 'better' as 'worst',

Calm down... tongue.gif
People who WANT to understand, will understand.
People who do not care - will not...
There is old russian saying, i will try to translate:
When you argue with a fool - take care, other people could see no difference laugh.gif
[between]
Canar
QUOTE(rjamorim @ May 25 2004, 10:21 AM)
Erm... it's in the results page. Read the second sentence of "How to interpret the plots:

Now, officially they are tied. But considering Vorbis' score is above MPC's confidence margin, I would say, with some confidence, that Vorbis aoTuV is better than MPC, at this bitrate.

Haha, yeah, figured I could find it out in a few moments, but I didn't really have them when I posted.

You make sense with the confidence margin thing, true, but you're likely going to start confusing the less statistically minded unless you stick pretty hardcore to the 95% confidence interval information. Either that, or you qualify the hell out of any statement that doesn't comply to the 95% interval.
rjamorim
QUOTE(Canar @ May 25 2004, 06:13 PM)
You make sense with the confidence margin thing, true, but you're likely going to start confusing the less statistically minded unless you stick pretty hardcore to the 95% confidence interval information.

It's no use. This test is not controlled enough to warrant sticking to the 95% confidence as if it was gospel. Differently from ITU tests, I have no control of participants' listening environment, equipment, training, fatigue, etc. (and that's why ITU tests are damn expensive)

These results are there just to give an idea of how codecs rank. They are not trying to be definitive in what they report. And people should still test for themselves to decide what codec beter suits them, and consider other features like availability, hardware support, etc, etc.
kwanbis
QUOTE(Canar @ May 25 2004, 06:05 PM)
I'm surprised this "claim" hasn't been debunked yet. Vorbis did not win.

even if they are both tied (Vorbis and MPC) it means that vorbis (and (posibli) MPC) won.
earwax
QUOTE(rjamorim @ May 25 2004, 09:17 AM)

QUOTE
Was there any pattern in those 54 discarded results as to which codecs' were mis-identified? If, for example, half of those 54 were thrown out because they ranked the reference vs. MPC that would be somewhat interesting.


Hrm... you would have to check the output of Chunky with the command line I posted earlier to see what results are being discarded, and then analyze these results one by one.

Oh, OK, I thought someone may have actually looked at those discarded results already.

Maybe someone can answer the other part of my question:

Does that mean that all of the users' rankings for that sample were thrown out or just for the codec(s) where they ranked the reference?
yoth.
ok, so it looks like the vbr contenders did very well and itunes's cbr held its on. how safe would it be to assume that using vbr with AAC (for instance the most recent FAAC with FB2K) would be a contender?
Mono
QUOTE(Daijoubu @ May 25 2004, 03:08 AM)

This guy's signature is great!
QUOTE
Best portable setup = 128kbps MP3 (super high quality, > CD!) -> transcoded to the best codec in the world, uber high quality ATRAC3/LP4 (5000% better than SACD) -> NetMD (faster than ur sh*tty firewire) -> N710 (EU version with 1.2mW x2, OH YEAHHHH BABY!) + MDR-E808 (bestest hedfonez in teh world!)
This will shizz on all ur lame iPods! Its sooooo clear dat I can almos feel teh mud flwing dwn da waterfal!

Worst portable setup = CD -> WAV -> (WaveGain @ 87dB) -> iTunes 4.5/QT 6.5.1 encoded 224kbps AAC or ALAC -> 3G iPod + Etymotic ER-4P

I am actually impressed with most responses there, but apparently some believe that the test is not fair because ATRAC was not tested on a preferred hardware DAC. rolleyes.gif
rjamorim
The arguments at the minidisc forums about hardware encoded Atrac3 sounding better than software encoded make no sense. The opposite actually makes more sense. On hardware, you must be worried about real time encoding, voltage consumption and battery consumption. On software, you can go nuts.

So, if Sony cut corners somewhere, it must have been on hardware due to inherent limitations.
Cygnus X1
QUOTE(rjamorim @ May 25 2004, 05:18 PM)
The arguments at the minidisc forums about hardware encoded Atrac3 sounding better than software encoded make no sense. The opposite actually makes more sense. On hardware, you must be worried about real time encoding, voltage consumption and battery consumption. On software, you can go nuts.

So, if Sony cut corners somewhere, it must have been on hardware due to inherent limitations.


Even worse is the fact that some people claim ATRAC3 sounds "better" decoded through Type-S or their 1-bit digital amps, so the test is therefore invalid sick.gif

I don't think that some people understand the point of comparing lossy codecs: it's not to see which one sounds "warm" or "fat" or "has better bass," it's to compare artifacts, with the best codec having the least number of and/or least annoying artifacts. I want to smack people when they claim that while ATRAC3 sounds worse than MP3 on the computer, it will sound better going through their 1-bit digital amp. NOOOO!!! mad.gif An artifact is an artifact...a phasey cymbal or dropout will still be there no matter how good your amp or boost boost is. I'm personally surprised that although many people claim to be able to discern the "higher quality" of a 1-bit digital amp on certain players, they apparently aren't able to pick out what are sometimes blatant artifacts. I wonder how much of that can be attributed to marketing?
ff123
Replies gathering at:

http://microsoftusernetwork.com/forum/viewtopic.php?p=275

where the response from the forum moderator is surprisingly dismissive. Oh well.

ff123
QuantumKnot
QUOTE(kwanbis @ May 26 2004, 07:27 AM)
QUOTE(Canar @ May 25 2004, 06:05 PM)
I'm surprised this "claim" hasn't been debunked yet. Vorbis did not win.

even if they are both tied (Vorbis and MPC) it means that vorbis (and (posibli) MPC) won.

I agree. To have a winner, you must have a loser(s). And there are some notable losers in this test (ie. ATRAC3). Since Vorbis and MPC are statistically tied, they both won over the rest.
phong
I've uploaded chunky-0.8.4 which fixes the filename extension problem. Also, I've changed the default behavior so that it discards files with ranked references (i.e. -p 0.0 is assumed unless specified otherwise).

You can get it, as usual, at http://www.phong.org/chunky/

QUOTE(earwax)
Does that mean that all of the users' rankings for that sample were thrown out or just for the codec(s) where they ranked the reference?

Yes, the whole result for that sample is thrown out. To do otherwise would taint the results. Even if you just guessed without listening, you would get about half of them right - if you just discarded the wrong ones, you'd still have half left with completely invalid ratings. The only safe route is to toss the whole result file.

On the other hand, it is possible that the reference was ranked inadvertantly even if they did hear a difference (if it was very subtle). In those cases (i.e. when the differences are subtle), it's best to make an ABX test - if you are successful, the ranked reference won't cause it to be discarded. If you fail the ABX test, then you know you probably didn't hear a difference and you shouldn't rank the sample at all (leave it at 5.0).
Canar
QUOTE(QuantumKnot @ May 25 2004, 04:08 PM)
I agree.  To have a winner, you must have a loser(s).  And there are some notable losers in this test (ie. ATRAC3).  Since Vorbis and MPC are statistically tied, they both won over the rest.

I meant that Vorbis didn't win when compared to Musepack. That's all. I didn't mean globally. Sorry for the confusion.
StoneRoses
QUOTE(rjamorim @ May 26 2004, 05:18 AM)
The arguments at the minidisc forums about hardware encoded Atrac3 sounding better than software encoded make no sense.

How can you jump straight into the conclusion like that?

Hardware ATRAC3 encoder in MD player may use different codebase from software counterpart.

Sonic Stage to ATRAC3 maybe something like Blade is to MP3. We have to test it.
rjamorim
QUOTE(StoneRoses @ May 26 2004, 04:13 AM)
How can you jump straight into the conclusion like that?

Hardware ATRAC3 encoder in MD player may use different codebase from software counterpart.

Sonic Stage to ATRAC3 maybe something like Blade is to MP3. We have to test it.

<sigh>

Have you ever even bothered reading the rest of my post?

Here, let me give you some knowledge. That way, you will think twice before posting next time:

On hardware, a developer must be concerned about constraints like voltage consumption, battery consumption, real time encoding, less precision (no FPU), a fraction of the CPU clocks, etc.

On software, the developer can go nuts since none of those restrictions apply.

On codec development, the usual path is first creating a software implementation (that will also be later used for compliancy tests), and then start cutting corners and complexity for the hardware version until it reaches the desired performance.

FOR THAT REASON, I claim it's nonsense. I don't claim it's impossible, maybe Sony has some serious voodoo going on there. But it does go against common sense.

Common sense is that they aren't deliberately putting a worse version of Atrac3 "like blade is for MP3" on SonicStage for kicks and giggles.

You're welcome.

Regards;

Roberto.
Big_Berny
Well, I'm not sure about MPC.
Before the test I mentioned that MPC perhaps is only as good because it uses very high bitrates on this problemsamples! But if the average bitrate is 128 for the tested qualitysetting, there should also be a lot samples with bitrates under 128kbits! Logical, isn't it?
The problem on this test is that most samples had high bitrates and the samples with small bitrates were not ranked as good!

For example you could also modify an mp3-encoder to user very high bitrates (160kbits) on difficult samples and very low (80kbits) on normal samples. In this thest it would probably be better thant the current lame-encoder but in practice there would be a lot of songs which would sound very bad!

I hope you understand what I mean. But perhaps my idea is totally wrong!?

Big_Berny
Kblood
Congratulations to Roberto for once more pulling through a tough one. Great work, greatly appreciated.

Regarding the bitrate "issue"...

The encoders in the test were using standard settings, they were not specially tweaked for the test, and you can go ahead and use them with your songs.

So if some of the encoders have a flawed code to choose the bitrate in tough passages of music, well, it's their problem.

I think this test is really useful as an indication of which encoder does a better job with a setting that will end up giving an average 128kbps in a whole bunch of music. And I fail to see what's wrong with the idea.
kalmark
Just to throw in my 2 cents, considering the ATRAC and WMA forums' responses:

I think we all more or less knew that there will be such reaction, when we post these results. I even think some hoped for such reaction, so they can say that these people make unsupported claims and such.

I myself trust the results, though I won't change my encoding habits: Lame aps for me, as I only have an mp3-capable portable. The people in the ATRAC/WMA forums won't do that either, IMHO, as they payed a lot of money to be able to use the formats they defend now.

And, to be honest, those people who really care about audio quality, end up at HA finally wink.gif And those who don't help the sales of lower quality codec capable devices skyrocket, because they only listen to the commercials.

And don't tell me you didn't read the "2 cents" warning smile.gif

One more on-topic question: is it possible to send these results to portable manufacturers? Would it make some reason if we'd make a thread for collecting contact email addresses, so we could mail most portable manufacturers, to give them a hint what to develop? E.g. Daisy MM (manufacturer of Diva) wrote me in an email, that they would consider implementing further codecs, if their licensing fees are fine. So why not give the companies a hint?
StoneRoses
QUOTE(rjamorim @ May 26 2004, 02:24 PM)
Have you ever even bothered reading the rest of my post?

I did read your post on minidisc forum (and agree with you on that sense) before posting that.

My point is if they (minidiscers) claim that their MD hardware encodes better, then we should consider their claim. Similar to how we select the best encoder for other codecs in your test.
Digga
QUOTE(StoneRoses @ May 26 2004, 11:58 AM)
My point is if they (minidiscers) claim that their MD hardware encodes better, then we should consider their claim. Similar to how we select the best encoder for other codecs in your test.

consider and then dismiss, if there is no proof for the claim, other than general subjective opinions.
if there is (semi-) scientific proof, it will will be gladly accepted.

guess what's gonna happen.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.