Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: The "average" listener (Read 19174 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

The "average" listener

I know most of you are audiophiles, but does anyone think it's worthwhile doing some testing for people who don't have super equipment or wonderful hearing?

I don't know if this has already been taken into account, but maybe things like -alt-preset standard are overkill for those with "average" setups - the majority of people. The results might be a useful thing to implement in recommended settings for general users.

Your thoughts?

The "average" listener

Reply #1
How do you decide whether a given person is an "average listener"?

You seem to imply that this would be people that don't have "super equipment" or "wonderful hearing," but what exactly do these things mean?

What is defined as "super equipment"? (It may be worth noting right here in fact that the majority of users on HA, AFAIK, do not have "super equipment," at least in comparison to the equipment that often shows up in "audiophile" magazines in the in price ranges of thousands to tens of thousands of dollars -- regardless of whether that equipment is actually of high performance or not.)

And how do you define "wonderful hearing"?  This is completely subjective, because you need a baseline to establish a comparison against.  So you arbitrarily choose either 1) some specific listener that you measure others against, and assume that if they can hear what he/she can, or more, that they are "wonderful listeners," or 2) you arbitrarily pick some sort of characteristics from a signal that you feel (for whatever reason) that a listener must be able to discern to be considered part of the "wonderful listener" group.

So, in the end, it's almost completely arbitrary.  This makes such an effort, IMO, pretty useless.  But for a little bit of history, this is just about exactly what Roel tried with the --r3mix switches:  he aimed for "average listener" performance, but the big problem with that was that "average listener" was defined solely by his own whims.  As it turned out, many so-called "average listeners" found that his assumptions were wrong when it actually came time for people to do the listening tests.

Really, it makes much more sense to aim for either "transparency" or "least offensive," both quite simply defined according to a consensus among the largest group of listeners that you can manage to assemble for a given test.  Of course this may not be fully representative for certain groups of listeners (ALL listeners for example, or "average listeners" even), but it's about the best you can do.

The "average" listener

Reply #2
I think it would be useful

The "average" listener

Reply #3
I should have guessed it would have too many counfounders.

Nevermind then.

The "average" listener

Reply #4
Not everyone has "super equipment". My setup (in my sig) cost about $750 AUD ($575 USD).

The "average" listener

Reply #5
The equipment is largely irrelevant in determining what codec flaws you can hear.

At least much more irrelevant than most people think.

The same is true for "wonderful hearing". It has more to do with training and experience. That's why your idea (which, as Dibrom explained, has been tried before) is harmful in the long run.

The "average" listener

Reply #6
Some time ago and just out of curiosity, I tried to recognize the flaws in some MP3 testsamples (from listening tests) using cheap DAP's. Most of them were clearly noticable. My conclusion was: Even cheap Players need well encoded music.

The "average" listener

Reply #7
I think blue57's idea has some merit, it's just his reasoning that is flawed.

A better explanation might be that it has become very difficult to spot encoder artifacts these days without training yourself to hear them. I know that I can't ABX Lame at anything over 130-140 kbps, even on many of the difficult samples. Maybe I have worse hearing than average, but I think that people without artifact training wouldn't do much better.

Personally I think that --preset medium doesn't get nearly enough love here on HA. It is what I use for my DAP, for encodes I do to send to friends, and other such. (On my main computer I use musepack.) It may fail more often than standard, but in a large majority of samples for a large majority of people it will be transparent.

In some ways, this is a fantastic compliment to the devs of Lame, Vorbis, and AAC. It helps that people are willing to spend more kbps these days than they were years ago, but great improvements have been made even for the peolpe who don't know about anything other than 128.

The "average" listener

Reply #8
Usually, detecting artifacts is mostly a matter of "training" - equipment and hearing-abilities are secondary.

Many large-scale tests which have been done(especially roberto's listening tests) were done around 128kbit, so if that is what you're looking for, it does already exist. Even more so, since medium-bitrate tests are more common because they are easier to do.

If what you ask about is using "normal samples" instead of "killer samples", then that may be difficult to do. The problem is that all modern lossy codecs are already so good that they are indistinguishable to the original most of the time(i.e. with "normal samples"). So, current listening tests and tunings are mostly about "problem cases" - not about how an encoder performs most of the time.
I am arrogant and I can afford it because I deliver.

The "average" listener

Reply #9
Quote
Not everyone has "super equipment". My setup (in my sig) cost about $750 AUD ($575 USD).
[a href="index.php?act=findpost&pid=327723"][{POST_SNAPBACK}][/a]

I was meaning the people with >$50 setups.. mobo with integrated sound, cheapo speakers, $15 headphones (like myself), and have music on for enjoyment or as a sonic decoration, rather than those who focus on the audio.

Anyway, don't worry about it now. -aps may be overkill for me right now, but at least there's some room for growth in the future.

The "average" listener

Reply #10
Quote
I was meaning the people with >$50 setups.. mobo with integrated sound, cheapo speakers, $15 headphones (like myself)

I dont understand - why would someone with such a setup care about "minor" differences in mp3-encoding? Its like going cheap with 90% of what is significant in audioquality, and then being picky with the remaining 10%.

- Lyx
I am arrogant and I can afford it because I deliver.

The "average" listener

Reply #11
Esentially, to save some disk space.

The "average" listener

Reply #12
Quote
I was meaning the people with >$50 setups.. mobo with integrated sound, cheapo speakers, $15 headphones (like myself), and have music on for enjoyment or as a sonic decoration, rather than those who focus on the audio.
If you choose the right pair of $20 headphones you can get plenty enough quality to appreciete better encodes. Integrated sound mobos are more of a problem, but there are plenty of cheap but useable sound cards out there. In fact, for $50 you could have the above headphones and a Chaintech AV710. Plus sound cards can easily be recycled from older computers.

As for only using music for sonic decoration... The problem with a bad encode is that once you notice it, it can be difficult to ignore afterwards.

Quote
Anyway, don't worry about it now. -aps may be overkill for me right now, but at least there's some room for growth in the future.
First of all, don't feel pressured to do anything you don't want to do. Nobody here will disrespect you if you use preset medium instead of standard. If file size is important to you because you have limited space, that should be part of your decision.

Second, I think insurance is a big thing with the HA group. People use EAC not because it is better all the time, but because it is better in the worst case scenerio. They use lossless not because they can tell the difference from a hq lossy, but because it provides a perfect backup in case something happens to their cd collection. Many here have invested large amounts of money and time in their music collections, so putting extra effort into the computer version thereof is only natural.

The "average" listener

Reply #13
Quote
I was meaning the people with >$50 setups.. mobo with integrated sound, cheapo speakers, $15 headphones (like myself), and have music on for enjoyment or as a sonic decoration, rather than those who focus on the audio.

Anyway, don't worry about it now. -aps may be overkill for me right now, but at least there's some room for growth in the future.
[a href="index.php?act=findpost&pid=327789"][{POST_SNAPBACK}][/a]


You hit the nail on the head right there. Room for growth. Maybe you have what some would consider 'low end' equipment, but say you get a few extra dimes and upugrade your soundcard and speakers? You would want the quality of your encodes to hold up without having to re-encode everyting.
--
Eric

The "average" listener

Reply #14
Quote
I dont understand - why would someone with such a setup care about "minor" differences in mp3-encoding? Its like going cheap with 90% of what is significant in audioquality, and then being picky with the remaining 10%.

- Lyx
[a href="index.php?act=findpost&pid=327790"][{POST_SNAPBACK}][/a]

What's wrong w/ someone trying to get the best sounding audio out of his/her 'tin cans' all the while saving some space on the h/d?
I think that's what most, if not all, people are trying to achieve by extensively testing out LAME or any other encoders.
It doesn't matter whether it's cheap or expensive setups; it's what compressed audio's all about.

I'm just seeing it from a different perspective.
If I put myself in a position where I can't spare a dime for audio h/w & h/d space, I know I'll try to get the best of both worlds.
Getting it as small as possible, all the while sounding not too bad on the speakers.

To quote it right off HA's slogan :

the audio technology enthusiast's resource

If blue57 is enthusiasted about the 'best' sounding compression level on a 'tin cans' setup, let's just help him.
Or at least show him on how he could help the others, and even himself, out.
Who knows, he might produce his own blind test and become something of a guruboolez calibre at 'tin cans' setup.

The "average" listener

Reply #15
I don't think the issues with such a test are too insurmountable. As others have mentioned, all that is really necessary is for untrained listeners to do the tests. Along with this we might want to consider relaxing the ABX protocol a bit, because the focus here is on casual listening.

128kbps encoder tests are somewhere around what is requested here, but really, the quality of 128k encodes nowadays is extraordinarily high, and we're not really sure what it means for the average joe. I would not be surprised if lame -V7 turns out to be transparent to some people.

The "average" listener

Reply #16
Quote
Along with this we might want to consider relaxing the ABX protocol a bit, because the focus here is on casual listening.
[a href="index.php?act=findpost&pid=327871"][{POST_SNAPBACK}][/a]


You are scaring me.

The "average" listener

Reply #17
Come on! All I'm saying is that if you take a random person off the street and throw them in an ABX test on halfway decent hardware, I would expect them to hear a lot more things than if they just listened to the music casually, even if the test was relatively short. And that's exactly what we're testing, casual listening. In that context an ABX test is testing the wrong thing.

Now, I admit to having absolutely no idea how such a test would operate and still have any power whatsoever, but it's important enough not to discount out of hand.

The "average" listener

Reply #18
Quote
I don't think the issues with such a test are too insurmountable. As others have mentioned, all that is really necessary is for untrained listeners to do the tests.


So here we go again...

What is an "untrained listener"?  Is this someone who hasn't spent much time listening to lossy audio codecs?  Or is this someone who doesn't listen to music much in general?  Or is this someone who ... ?

I can think of quite a few people who, in other contexts, people would consider "trained" (musicians for example, or stereotypical "audiophiles"), but where such a label doesn't necessarily translate well into this domain.

The problem with that right away is that as soon as you label such a group and use them to make a statement regarding quality derived from the results of their listening test, you're going to have some group complaining about the representativeness of it all.  Ultimately, it'd take a lot of effort to make the arbitrary distinctions necessary to setup the test, and you'd only end up with questionable results.  It's simply not worthwhile.

Quote
Along with this we might want to consider relaxing the ABX protocol a bit, because the focus here is on casual listening.


Why?

Again, you have the same sort of problem.  One person's "casual listening" is going to be completely different from another's.  What you need to do to get a representative result is control this situation as much as possible, and that's what ABX provides for us.

But let's say that somehow, even given these problems, you find a correct way to carry out such a test and to rely on such relaxed conditions.  Why would any developer in their right mind want to waste tuning for such results?  It's a waste of effort because, in the end, people are going to ultimately compare their efforts based on some sort of benchmark, and "casual listening" is hardly a good metric to use for that.  And furthermore, the results themselves are questionable because they lack a level of objectivity that is necessary to really nail down problems in quality and fix them.  If you relax ABX and tune for the "average listener," you'll spend all your time chasing phantoms.

No, the way to do it is the same way I listed earlier.  Then, if you need lower quality, you provide some sort of smooth quality scaling.  Most codecs do that these days anyway (LAME with -V, Vorbis and MPC with --q).  From here, a particular individual can determine what meets their own "casual" needs through a few simple listening tests.

Quote
128kbps encoder tests are somewhere around what is requested here,


And how do you know this?

The "average" listener

Reply #19
Quote
And that's exactly what we're testing, casual listening. In that context an ABX test is testing the wrong thing.


Yes, that is the wrong thing in such a scenario.  But so is trying to use such an uncontrolled situation to make a general conclusion, or to use the results of such a situation to perform the type of precision work that is involved with codec tuning.

Quote
Now, I admit to having absolutely no idea how such a test would operate and still have any power whatsoever, but it's important enough not to discount out of hand.
[a href="index.php?act=findpost&pid=327877"][{POST_SNAPBACK}][/a]


I don't think anyone is discounting it out of hand.  It has been explained why it's not a very workable idea, and furthermore, why it's a bad thing from a quality tuning point of view.  This is an issue that has been discussed in a lot of different contexts in HA's early history.

The "average" listener

Reply #20
Quote
What is an "untrained listener"?  Is this someone who hasn't spent much time listening to lossy audio codecs?  Or is this someone who doesn't listen to music much in general?  Or is this someone who ... ?

I can think of quite a few people who, in other contexts, people would consider "trained" (musicians for example, or stereotypical "audiophiles"), but where such a label doesn't necessarily translate well into this domain.

The problem with that right away is that as soon as you label such a group and use them to make a statement regarding quality derived from the results of their listening test, you're going to have some group complaining about the representativeness of it all.  Ultimately, it'd take a lot of effort to make the arbitrary distinctions necessary to setup the test, and you'd only end up with questionable results.  It's simply not worthwhile.


An untrained listener is exactly what the adjective implies: Somebody with no experience in detecting encoder artifacts. While I can't offer any evidence that such a listener is particularly fungible (ie musicians and regular listeners and audiophiles would make equally fine untrained listeners), I see no evidence to the contrary either. If such a listener is fungible, then I would argue that the power of a test involving them is going to be OK - not good, not great, but acceptable for the target audience.

In other words, your point about representativeness is cogent, but to the best of my knowledge, not actually validated. In this respect this situation only differs from transparent encoding tests by degrees. Before HA and before ff123 and before the whole listening test era, was it obvious that there is a well-defined boundary of transparency for properly designed encoders? (Actually that isn't a very rhetorical question as I don't know the answer; if it was obvious, then this comparison isn't that valid.)

Quote
Quote
Along with this we might want to consider relaxing the ABX protocol a bit, because the focus here is on casual listening.


Why?

Again, you have the same sort of problem.  One person's "casual listening" is going to be completely different from another's.  What you need to do to get a representative result is control this situation as much as possible, and that's what ABX provides for us.

But let's say that somehow, even given these problems, you find a correct way to carry out such a test and to rely on such relaxed conditions.  Why would any developer in their right mind want to waste tuning for such results?  It's a waste of effort because, in the end, people are going to ultimately compare their efforts based on some sort of benchmark, and "casual listening" is hardly a good metric to use for that.  And furthermore, the results themselves are questionable because they lack a level of objectivity that is necessary to really nail down problems in quality and fix them.  If you relax ABX and tune for the "average listener," you'll spend all your time chasing phantoms.


I agree that tuning for this sort of thing is mostly useless. ie, when you tune an encoder, it would be far more effective to tune based on transparency and based on 1-5 ranking results with trained ears rather than based on casual listening. For transparency, there is a well-defined and psychoacoustically sound boundary to tune for. For casual listening there isn't, and you'd have some people who can't hear anything and some who are just naturals at telling differences, which makes drawing that sort of a boundary impossible.

If I were to ad-hoc this a bit further, I would argue that this could be worked around by making the result for such a test statistically determined from the distribution of listeners rather than the ABX results themselves. That is, given 20 or so listeners, the final "tuning" is going to be the one that yields "casual listening transparency", whatever that is, for, say, 70% of the listeners (I pulled that number out of a hat).

Quote
No, the way to do it is the same way I listed earlier.  Then, if you need lower quality, you provide some sort of smooth quality scaling.  Most codecs do that these days anyway (LAME with -V, Vorbis and MPC with --q).  From here, a particular individual can determine what meets their own "casual" needs through a few simple listening tests.


In all reality these are probably sufficient for most people. The only realistic thing that a casual listening tuning would turn into is a note saying "start at -Vsomething for background music, car music, casusl listening or workout music etc", and that is so close to the current recommendations that it's probably not worth pursuing.

Quote
Quote
128kbps encoder tests are somewhere around what is requested here,


And how do you know this?
[a href="index.php?act=findpost&pid=327878"][{POST_SNAPBACK}][/a]

I admit, I don't, I was just guessing. I think I may have confused "casual listening" with "listening at acceptable levels of distortion", which is what the 128 tests have been doing.

This thread is getting a bit out of hand, and the issue itself is kind of moot for me because I would never use such a "casual listening" encode, and Dibrom has made enough good points about the efficacy of all of it, so I'll bow out at this post.

The "average" listener

Reply #21
Axon, there is another reason why this would be a waste of resources...... you talk about a "relaxed/quick test".... which basically just means "half-assed" - to "simulate" the attention-strength with which average joe usually listens to music....... guess what, i'll tell you what will probably be the result, asuming that proper blind-testing-methodology is preserved: At -v5 almost no one of your "untrained casual listeners" will notice a difference.... heck, they probably will even fail at -v6!

So what does that tell you? Nothing, because people will turn the "goal" of the test into a flaw: They will begin to ask "oh, but i want to have some safety-buffer in case i listen more carefully to music. And i dont want to reencode my mp3s when i get better equipment.". So, the logical step for those people is to encode at -v4 (preset medium). HOWEVER, exactly this setting was already considered the "safe setting for average joe" BEFORE you did this test - so in the end, you revealed nothing new and the test was a waste of resources.

Its like with many other things: The more effort you spent on a test, the more useful the results. The less effort you spent on a test, the more useless the results. With this, i dont mean that a test with "average joe" should not be done - but you cannot lower the quality of the test without as well lowering its usefulness.
I am arrogant and I can afford it because I deliver.

The "average" listener

Reply #22
Quote
does anyone think it's worthwhile doing some testing for people who don't have super equipment or wonderful hearing?
Quote
I dont understand - why would someone with such a setup care about "minor" differences in mp3-encoding?
Quote

Esentially, to save some disk space.

OK now, saving disk space is actually a practical and sensible goal. Lowering quality just because it won't be missed (due to poor hearing or equipment) is not. If file size is not a matter of concern, then the "excess" audio quality does no harm.

Achieving the best possible quality at given lower bitrates is already a subject that gets a fair amount of attention. Listening tests and encoder (+ settings) recommendations are out there. An individual only needs to do a relatively small amount of personal testing to choose what's acceptable for their particular needs.

The "average" listener

Reply #23
I think what is really needed isn't a "casual listening test", but instead V4/preset medium getting a more prominent presentation and attention. I suspect that the high attention which V2 and V0 (standard and extreme) get here is the reason why "casual listeners" feel that their needs aren't covered enough. So, IMHO it is not a lack of knowledge/data which is the problem but instead the "marginalized" presentation and popularity of V4/medium.
I am arrogant and I can afford it because I deliver.

The "average" listener

Reply #24
Quote
I think what is really needed isn't a "casual listening test", but instead V4/preset medium getting a more prominent presentation and attention. I suspect that the high attention which V2 and V0 (standard and extreme) get here is the reason why "casual listeners" feel that their needs aren't covered enough. So, IMHO it is not a lack of knowledge/data which is the problem but instead the "marginalized" presentation and popularity of V4/medium. [a href="index.php?act=findpost&pid=327969"][{POST_SNAPBACK}][/a]

I'm with Lyx here. I can recognize artifacts pretty well by now, but on the vast majority of stuff I can't tell -V4 apart, and I encode most stuff at -V4 or -V3, going with -V2 on something that I'm more likely to be archiving (e.g., when I rip a friend's cd).

Dibrom, I can understand that you've seen this sort of thing before, and I agree that relaxing the ABX standard wouldn't make any sense, but I think that an "untrained listener" isn't so much of a definitional problem....

Overall, I think the best thing is to give more attention to -V4 and -V5.

On a semi-related note, I had a friend over on Wednesday night, a physicist who's an audiophile and really into classical music and obscure metal, and apparently has really good hearing. He thinks he can tell -V4 encodes apart pretty eaily because he's familir with the music and can sense that it's lacking some high-frequency. Then it turns out he can't ABX a typical metal sample (opening of Metallica's sad but true) on -V5. Boy, was he squirming, and not because I was pressuring or making fun of him, but just because he realized that he had no idea and was failing a double-blind test on relatively lower-quality encoded music.
God kills a kitten every time you encode with CBR 320