Help - Search - Members - Calendar
Full Version: Transparent encoding for older ears?
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - Tech
buzzy
Given the reality that I can't hear anything over 16khz, and it's even less than that at any reasonable db level - I'm wondering what a safely transparent encoding method is for older ears. (I'd like to use one approach for both portable and hifi listening, so don't want to waste too much storage space.)

The technical question is - which of these scenarios fits with how LAME encodes in the V2 to V5 range of presets? At some point, does the model start to look for places to trim the bitrate that affect more than just the high frequencies, or sounds that are otherwise psychoacoustically expendable?

1 - Can I just use the presets? Is the savings in bitrate just coming from the lowpass filter taking out the high frequencies as described in the LAME wiki page? If so, I could safely use V4 (or 5 or 6, though I might not). Or

2- Do I need to use switches? Are there compromises being made in the lower bitrate presets that affect the sub-16khz frequencies? If so, can I get the result I'm looking for by adding a lowpass switch to a preset (if the presets still allow switches, in a way that will encode properly)? Or alternatively, if just using the presets means it will still encode the higher frequencies when it's not to difficult for it to do - can I use a lowpass to make sure it doesn't do that?

While I won't miss the high frequencies, I am sensitive to artifacts, so would want to safely avoid those.

No doubt this has been asked before, but I've searched around a fair amount and didn't see the answer, so feel free to just point me to it if you know where it is. The FAQ thread on the Y switch seems too old to rely on without double checking. Thanks.

May be a common enough topic to add to the list of recommended settings.
shadowking
There is no 100% safe encoding, at least with mp3. Artifacts might not have anything do to with >16khz region. You just have to find a good enough setting where *most* stuff sounds transparent and you do this with abx testing. Personally V5 sound good enough for portable use but I can pick it out here and there. V3 is usually transparent enough for me. I would transcode to -v5 for portable use, otherwise I'd use -V3 for PC and portable and keep a lossless archive.

No need for switches. -Y is enabled for presets < V2

Just test a dozen or tracks you know.
Porcupine
QUOTE(buzzy @ May 13 2007, 18:14) *
At some point, does the model start to look for places to trim the bitrate that affect more than just the high frequencies
I would say the answer to that is 'definitely.' Given that you have encoding criteria that is not the typical norm, I think you should indeed use switches. I would add --lowpass 16000 if you really think you can't hear those frequencies and you are the only intended listener for your files. Although LAME naturally starts sacrificing the higher freqs at the lower quality settings, it's still not as complete a sacrifice as --lowpass 16000, so you should add it.

Everything else should probably be normal, and you just select the V setting or CBR/ABR bitrate to a level that you don't notice artifacts. The fact that you use --lowpass 16000 should help your encodings have less artifacts.
boojum
I am 67. I use V2 and let it go at that. cool.gif
garym
I use V2 and play through portable and through a medium quality home system. With 48,000 files I've never noticed any issues and can't even ABX a lossless file as compared to my V2 files when played through my home system with a benchmark DAC1 (tried jazz, rock, country....don't have much classical so can't speak to that). I can't hear anything above about 14k. Too many front row rock concerts in the 1960s.... I suspect V4 or V5 would probably be fine too, but the V2 is a compromise for me between lossless and a really small file size.

QUOTE(boojum @ May 13 2007, 22:41) *

I am 67. I use V2 and let it go at that. cool.gif

buzzy
QUOTE(Porcupine @ May 13 2007, 23:36) *
I think you should indeed use switches. I would add --lowpass 16000 if you really think you can't hear those frequencies and you are the only intended listener for your files. Although LAME naturally starts sacrificing the higher freqs at the lower quality settings, it's still not as complete a sacrifice as --lowpass 16000, so you should add it.
Thanks Porcupine, I'll try that on a few test tracks. It wasn't even clear to me whether the presets accepted switches these days, or would encode properly when using them, and it doesn't seem to be documented. (For that matter, I couldn't find anything about the Y switch, either, except mention in threads.)

shadowking, transcoded files are where I have heard the most obvious artifacts, though usually low bit rate stuff that I did mp3 > wma for an old 64MB (!) portable I had years ago. That's extreme, but I'm still convinced that transcoding is a stopgap, last resort approach. And I'd much rather have just one encode to use on portables and at home, even if that means I have to rotate files on and off the portable now and then because they're a little bigger than ideal for a portable.

BTW, I will say the new presentation about which preset to use in which context is great. I spent years trying to get people to understand that they shouldn't get too worked up about lossless vs. mp3 if they were listening in their car or on the street.

QUOTE(garym @ May 14 2007, 12:23) *

I use V2 and play through portable and through a medium quality home system. With 48,000 files I've never noticed any issues and can't even ABX a lossless file as compared to my V2 files when played through my home system with a benchmark DAC1 (tried jazz, rock, country....don't have much classical so can't speak to that). I can't hear anything above about 14k. Too many front row rock concerts in the 1960s.... I suspect V4 or V5 would probably be fine too, but the V2 is a compromise for me between lossless and a really small file size.
Thanks. All true, but based on what people have said, it seems like using v1 or v2 and a lowpass might give better perceived quality in less space.

QUOTE(boojum @ May 14 2007, 00:41) *
I am 67. I use V2 and let it go at that. cool.gif
Well, storage gets cheaper all the time. But ironically, by "older" I meant more like 37, not 67. (It probably even applies to 27. I justhope more people are getting the word about protecting their hearing these days.) So I figure it's just wasted space, and I'd rather just have one encode to use for everything.

IPB Image

- from this NY Times link - let me know if the image isn't showing up.

I don't think most people realize how high 16khz is, or how loud the music needs to be for frequencies above that to come through. Even if you can hear them, you need to be listening at fairly loud levels for them to come through.

Interesting hearing test link - but make sure your headphones actually reproduce the frequencies, some don't.

QUOTE(Porcupine @ May 13 2007, 23:36) *
Given that you have encoding criteria that is not the typical norm
Actually, for lots of people encoding for their own use - I'd say maybe this should be the norm, people just don't realize it!

Though maybe they've been reluctant to include it, as it only provides fuel to the anti-mp3 crowd. By this time, though, that's shrunk to a hardcore given that everyone has portables.
buzzy
BTW, it seems like it might be worth thinking about using lowpass width, too.

QUOTE
--lowpass <freq> frequency(kHz), lowpass filter cutoff above freq
--lowpass-width <freq> frequency(kHz) - default 15% of lowpass freq
A bit of documentation for the Y switch ... from running lame --longhelp

QUOTE
experimental switches:
-Y lets LAME ignore noise in sfb21, like in CBR
pdq
A word of warning about the -Y switch - its effect is very much dependent on which version of LAME you are using, and in fact its effect in some versions is the opposite of its effect in other versions. It is described as "experimental" with good reason.
SirChristof
I've seen alot of topics posted similar to this, with people wondering what would be "good" settings for them to otherwise use in what can be, especially for "newcomers", a confusing question to answer.
(namely, what encoding settings they should use)

My suggestion is to follow 1 of 2 philosophies(or a combination of them).


-------------------------------------------------------------------
Option 1: Decide exactly how much you need, with a tiny "safety overhead", and use no more.
With this approach, you would first decide "what you need", in this case "Transparent encoding for older ears". To find out "what you need" in this case, you really have no choice but to do some ABX tests, even if limited, to decide what is and is not acceptable to you. Assuming we are using -V presets in LAME, it would be adequate to choose a level you find transparent, then use one level above that for your overhead.

So if you did several ABX tests on some of your favorite songs, and they were all transparent at -V 5, except for one that you ABX'ed which became transparent at -V 4, using -V 2 would be a solid "safety overhead".
-------------------------------------------------------------------


-------------------------------------------------------------------
Option 2: Decide exactly how many resources you want to dedicate to the task, with a tiny "safety overhead", and use no more.
In this instance, you decide, based on your current resources & resources you plan to acquire in the future, what you can afford to allocate to this task. Decide how many CDs you have, along with how quickly you on average acquire new ones, and decide how much space you could dedicate to it. So if you say, "I have a 300GB hard drive that I can dedicate to it", and you have roughly 400 CDs, you could simply go lossless for them. This circumvents any "worries" or "questions", since we establish that you have the resources available. Even if lossless is not an option, what is generally viewed as an "overkill" preset(such as -V 0), could safely be used if you had the room to spare. Logically, the quality will not be WORSE from -V 0 vs -V 5 (or anything lower), so if the space is available and you can dedicate it to audio---then make it so.
-------------------------------------------------------------------


Personally, I use a combination of both philosophies. For albums I have that are merely "another disc in the stack", I use Option 1. For albums I really like or otherwise am a fan of, I use Option 2. In my case, this translates to a combination of -V 5, -V 2, and Lossless encodings. Big big favorites get FLAC. These I love & listen to often. Albums I like but not necessarily love get -V 2. I may not be able to ABX them, but remember---they wont sound worse. Bulk albums that I have but probably wont listen to often get -V 5. I can ABX some of them. Others are transparent. I don't care because it saves a ton of space. I think a "combo-attack" to the problem utilizing both philosophies is best, but feel free to pick one and just go with it. Your results should be excellent.
Porcupine
QUOTE(buzzy @ May 14 2007, 12:31) *
I don't think most people realize how high 16kHz is, or how loud the music needs to be for frequencies above that to come through. Even if you can hear them, you need to be listening at fairly loud levels for them to come through.

Actually, for lots of people encoding for their own use - I'd say maybe this should be the norm, people just don't realize it!
16 kHz is a sound that all of us should be very familiar with because it is the sound that a regular standard definition tube television makes whenever it is on. We probably hear this sound more than any other sound! smile.gif Or, at least we used to before HDTVs and Flat-Panel TVs came along.

For people with undamaged youthful hearing, 16 kHz is almost as sensitive as a normal frequency. Your own chart should help suggest that, 60 dB is typical talking volume, and I guess those teenagers can hear 17 kHz easily at typical talking volume, otherwise they wouldn't use those ringtones.

The other idea is that since they live in New York, which may be noisy and rowdy outdoors, it's a good idea to use a 17 kHz ringtone because a high frequency carries through the noise of a crowd much MUCH better than any normal sounds (like talking to your friend, etc). You can probably be a in room so rowdy that you can't hear what your friend next to you is saying but you can still hear a moderately soft (60 dB) 17 kHz tone because it's not masked by similar (crowd) frequencies.

When I was younger, I often knew if someone had a (regular tube) TV on or not from outside someone's house or room. The sound even carries through the walls and/or open windows much better than any other sound. I wouldn't be able to hear anything about what show they are watching (the audio is all drowned out by outdoor noise) but I could still hear the TV noise. My hearing is not nearly that good these days though (and one of my ears is around 50 dB more sensitive than the other at 16 kHz, wah, and even my hearing in the good ear has been damaged also).

Nowdays there are still a few CRT HDTVs hanging on to life, and one easy way to tell if a tube TV is a SDTV or HDTV is just to listen to the noise it makes when it is on, you don't even have to look at the TV. If you hear a noise then it's an SDTV, if no noise then HDTV (unless you can hear 35 kHz). Oh yeah, I forgot, I guess if you live in Europe things are different because their SDTVs have a different frame rate and resolution so the freq is different, not sure what it is (maybe it's similar, the frame rate is lower but resolution higher). I have no idea what HDTVs are like in Europe, maybe the world finally agreed to one standard for HDTV.

I guess another use for TV hearing power (for teenagers) is if you go into a big store like a Wal-Mart you can instantly know where the electronics are because you just go towards the TV 16 kHz noise you hear. I used to use this ability all the time when younger. I also tried to use it in shopping malls to find the electronics stores or arcades, but sometimes I would make mistakes.

Also the TV noise usually changes when you change channel, so you can tell when someone is changing the channel from far away. I have no idea how useful that is though.

I kind of agree with you about --lowpass 16000 probably being what most people should do if its only for their own use, though. The only reservation I have about that is that I don't think it's been scientifically proven that even if you are deaf above 16000 Hz, that filtering 16000 Hz using a digital polyphase filter leaves the sound perceptually unchanged. I would still worry that maybe transient response is sacrificed if you do that (even if you can't hear a pure tone of a high-freq sound, I'm not sure that proves you can't hear the effects of "high-freq" transients). But it's not been scientifically proven that you CAN, either. So it's just all unknown to me, do whatever you want. smile.gif
odious_m
QUOTE(SirChristof @ May 14 2007, 18:15) *


. . . . to choose a level you find transparent, then use one level above that for your overhead.

So if you did several ABX tests on some of your favorite songs, and they were all transparent at -V 3, except for one that you ABX'ed which became transparent at -V 4, using -V 5 would be a solid "safety overhead".



V5 is below V4. I believe that you meant to say that V2 would be the "safety overhead"--no?
SirChristof
QUOTE(odious_m @ May 14 2007, 21:09) *

QUOTE(SirChristof @ May 14 2007, 18:15) *


. . . . to choose a level you find transparent, then use one level above that for your overhead.

So if you did several ABX tests on some of your favorite songs, and they were all transparent at -V 3, except for one that you ABX'ed which became transparent at -V 4, using -V 5 would be a solid "safety overhead".



V5 is below V4. I believe that you meant to say that V2 would be the "safety overhead"--no?



Yes.. I got the numerical order reversed.. I'll correct the error, thanks beer.gif
2Bdecided
If you can't hear anything above 16kHz (though see previous threads about the difficulty of testing this properly with some sound cards and headphones), it makes sense to low pass at this frequency, simply because mp3 can be so inefficient when encoding sounds above 16kHz.

Other that that, don't worry, and don't bother tweaking the low pass.

Why? Well, if you can appreciate frequencies above 16kHz in real music, but not, say, above 18kHz, there's little to be gained by including a low pass filter at 18kHz. The efficiency gains aren't great.

On the other hand, if you can't hear anything above 14kHz, or even 12kHz, you might think it makes sense to use those as low pass frequencies - but the efficiency gain over a 16kHz low pass is less than you might expect - and, of course, you music will sound audibly dull to younger listeners.

So I'd say, if you can't hear it, then a ~16kHz low pass makes sense because of inefficiencies within mp3, but anything else isn't worth the tweaking / effort / chance someone else will think it sounds dull.

(If your hearing stops at 5kHz, then YMMV!)


I don't always follow that advice myself! For recordings I might want to play to my kids sometimes, I don't use a lowpass because I can still remember some content benefiting from those high frequencies, back in the days when I could hear them! Mind you, I grew up with FM radio and cassette tapes, so hearing things above 16kHz from reproduced audio was a rare experience! It's things I played (very often!) on my Yamaha keyboard which now sound slightly less bright through the same speakers than they did 15 years ago! The remembered difference is less pronounced with CDs I owned 15 years ago, probably because they had slightly less treble to start with, and because I listened to them on a variety of speakers so don't have a single audible memory to compare them with.

Cheers,
David.
buzzy
QUOTE(2Bdecided @ May 15 2007, 06:05) *

If you can't hear anything above 16kHz (though see previous threads about the difficulty of testing this properly with some sound cards and headphones), it makes sense to low pass at this frequency, simply because mp3 can be so inefficient when encoding sounds above 16kHz.

Other that that, don't worry, and don't bother tweaking the low pass.
Thanks very much for the advice. I'm curious as to why you'd say not to tweak the lowpass though. I guess you mean - just use 16?

At first I thought you meant the width switch. I don't completely understand the --lowpass-width switch, but from looking at the lowpass info in the LAME wiki page, in the lowpass column of the table - if that's the width, it seems far smaller than 15% of the lowpass frequency.

"The remembered difference is less pronounced with CDs I owned 15 years ago" - I don't know that, other than my home stereo, any of the gear I had 15 years ago could reproduce above 16khz!

pdg - I'd be thinking about using the presets, so depending on which one I use, it either comes with the Y switch, or doesn't. Table of presets. I'm going to give v1 or v2 a try first, so those don't have it. In any case, with a 16khz lowpass, I'm not sure it would matter too much.

SirChristof - You're right, there is something of a philosphy question here. My philosophy would tend to be have a bigger safety margin, for a few reasons:
- I want to listen via my home system, too.
- It's not hard to rotate files off and onto a portable if they're a little large.
- Storage is getting cheaper fast, but encoding is time consuming. I'm re-encoding a lot of stuff, which makes me appreciate not having to do it again too soon! Though I have no illusions that this is the last time, and the next time probably will be lossless.

But I'm not ready to take the leap to encoding all my CDs to lossless yet. I have terabytes of flac files, lots of which are inaccessible because they're burned to disc and in a box ... somewhere. I still find the accessibility and portability of lossy to be very valuable, and it is transparent if done thoughtfully for the use it's put to.

As far as ABXing - that's a tough one, for a couple reasons. I'd have to do a lot of files, and listen to them a few times. Suppose there's only a problem with 1 track in 50 - that could be enough to be annoying later on, but I might not catch it now.

Also, for as long as there have been presets, they've been more or less saying not to try to roll your own settings, there's so much inter-related tuning and complexity that you might get a worse result. So in a way, I need an expert opinion.

Porcupine - as you say, there might be something lost, but I can't point to what it is. On the other hand, I'm very aware of being able to have a lot of music with me, wherever I go or in every place in the house. So trading maybe-losing-something for definitely-gaining-a-lot is worth it to me.

As far as 16khz, all I can say is that people should actually try something like the hearing test I posted to appreciate what they will actually hear at normal volumes (ie, those that won't damage your hearing even more!).

The chart below is hard to read, but it shows two important things: how it changes with age, and how our ears even at a tender age are far less sensitive to high frequencies. And yes, it's a test tone, not music, but the general effects are still there with music too. (Let's not get into hypersonics.)

Threshold of hearing (where you start to hear a given frequency). The numbers (from bottom to top ) are 20, 40, and 60. - link

IPB Image

BTW, I'm pretty sure my wife's super-annoying ringtone is set to more than a normal talking volume, so maybe the kids phones are too. It always makes me think about going to find my hammer and giving the phone a few whacks til it stops, and then a few more just for payback.
halb27
I'm 57 and realisized last week from a graph like yours that hearing ability drops rather steeply already from 2 kHz for people of my age.
Luckily I've always still enjoyed music very much.

As for encoding it's an advantage: we don't have to worry about a lowpass ~ 16 kHz (though I wouldn't go lower), and potential ringing artefacts shouldn't be too much of an audible problem either.

On the down side we have to be careful when talking about our listening experience. Things that are okay to us needn't be so for other people. But that's valid statement for other people's experience too - it's just of more serious concern to us.

When encoding mp3 using ~200 kbps (-V2, -V1, --alt-preset 220 or similar) you should feel safe IMO, especially when using an additional lowpass of ~ 16 kHz. For most tracks that's more than needed but gives you some headroom for problematic tracks.
uart
Personally I wouldn't bother with extra settings, the lower bit-rate "Vn" settings automatically add more LPF and reduce the bits used at very high frequencies.

If I were you I'd just do a few listening tests at V5, V4, V3 etc until you fiond the setting that suits you. For my needs I find V4 is ok so I usually select either V4 or V3 to be a little safer.
Porcupine
QUOTE(uart @ May 15 2007, 09:20) *
Personally I wouldn't bother with extra settings, the lower bit-rate "Vn" settings automatically add more LPF and reduce the bits used at very high frequencies.
What uart said is right but adding --lowpass 16000 will still save some more bits. So it's buzzy's own choice to make. About the lowpass width, I'm not sure. There's another recent post in the MP3 General forum where this was discussed a bit but I dunno if a conclusion was ever reached. I would probably just leave it, I don't think it's a big concern for me. Bandwidth filters are never perfect (but they can be very good)...the width is just a setting that affects the way your filter works and there are probably negative consequences to making it too thin. Your hearing doesn't instantaneously just bottom-out at any special frequency either, so a 15% or whatever width seems more than fine to me. If you think the width is too thick, then just lowpass at 16500 instead (this will probably ensure that the lowest frequencies that start to get affected are at 16000 Hz).

I've seen a lot of really different-looking threshold of hearing charts so I don't believe any one chart. I've seen a number of charts that still had people as being reasonably responsive at 20 kHz (moreso than 20 Hz), whereas your chart (which I've seen before also) claims that every human is utterly deaf at 20 kHz (which is also what LAME uses for its own Absolute Threshold of Hearing formulas). I've also seen one chart which had people becoming MORE sensitive with increasing frequency again above 19 kHz, continuing to 20 kHz and beyond. (I've also seen unrelated discussions explaining why this could be so). These were all scientific studies I must assume, but different methods must have been used, and I don't know for sure which are flawed and which are not.

For one thing, I would assume that any study resulting in a chart like buzzy posted is flawed, because the labeling of the lines indicates that the studies were done on "age groups." That's utterly flawed. For one thing, buzzy's earlier chart regarding the ringtones shows that at age 20 people's hearing is on the average already significantly damaged.

Furthermore, studies have been done and the hearing of even most children has already been damaged a little bit. There are also most likely genetic differences between people as well. If such studies are to believed, the only way to get a reasonable ATH or equal-loudness curve for those with the best ears would be to take a large group of children, test them, and keep only the best performing ones, and make a curve with just them. Even one person, if he demonstrates superior ability to others, deserves his own curve. Also, testing children is difficult, especially young ones, because they will not take the test seriously. Even if they take the test seriously (you must offer them a reward for good performance), a perfect ear is not the same as trained ear, and children still lack training and experience at hearing subtle sounds. Finding the perfect candidate is very difficult and I doubt any study has gone out of its way to do such a thing, most of them have only taken statistical averages, which in the case of hearing is a very bad thing to do in my opinion.
boojum
Another thing to consider is if anyone besides you will be listening to these sounds. cool.gif
indybrett
I used to use V2 with -Y (for the same reason, old ears). Now I use V3. It seems to not encode much above 16K by default, without using the Y switch.
halb27
I used buzzy's interesting hearing test link and was quite amazed that my hearing ability doesn't drop so seriously (due to my age of 57) just moderately above 2 kHz as I was afraid when I learned this from the curves. At 12 kHz however my ears' sensitivity is remarkably lower than at 1 kHz, and at 16 KHz I can't hear anything except the added noise that is produced at 0 db.
buzzy
porcupine, thanks again for the comments. one thing:
QUOTE(Porcupine @ May 15 2007, 18:10) *
I've seen a lot of really different-looking threshold of hearing charts so I don't believe any one chart. I've seen a number of charts that still had people as being reasonably responsive at 20 kHz (moreso than 20 Hz), whereas your chart (which I've seen before also) claims that every human is utterly deaf at 20 kHz (which is also what LAME uses for its own Absolute Threshold of Hearing formulas).
You're right that the way the 20 line is drawn doesn't seem right, which of course raises questions about the whole thing. But I don't know that it needs to be read too exactly - overall the main conclusions to me were something else - which is that

- people don't start hearing sound at high freqs until it reaches a fairly high db level. That's what the chart is ... the db level at which most of the sample could start to hear the sound. I think that's still true, despite your reservations about the methodology. So you need reasonably high volumes, and good isolation / low background noise, to get those sounds.

- people lose hearing ability with age

And as you say, those are averages, everyone's circumstances will vary.

I know in that hearing test link above, I had to go up 40-45db or so from the 5k level to the 0db level to hear 16khz, and the sound level seemed much lower so the real difference is more.

I'll give a few alternatives a try to see what happens to the bitrates, and look at some of the resulting frequency distributions to try to see what the width command is doing. It may take a while, but I'll come back and post what I find out.
2Bdecided
QUOTE(indybrett @ May 16 2007, 02:47) *

I used to use V2 with -Y (for the same reason, old ears). Now I use V3. It seems to not encode much above 16K by default, without using the Y switch.


According to this...

http://wiki.hydrogenaudio.org/index.php?ti...cal_information

-V3 includes -Y by default.

Cheers,
David.
buzzy
Part 1 - Figuring out the how the switches work

EDIT (2) - I'm no longer at all sure about what arguments the switches will accept. The more alternatives I've tried the less clear it becomes. Is it truncating, rounding, accepting only whole digits or certain fractions? Did lowpass width really accept a % argument, or did it just ignore that and use some part of what I entered?

I doubt I'll be able to figure it all out, and it seems a little pointless as I'm sure the developers could eventually just give us the answer. And it doesn't need to be exact. So for now I'm going to stick to fairly simple alternatives whole numbers or maybe .5s in the inputs.

What little I do know at this point:

The following examples of command lines give different results:

-V 1 --vbr-new --lowpass 16
-V 1 --vbr-new --lowpass 16.5 --lowpass-width .5
-V 1 --vbr-new --lowpass 16.5 --lowpass-width 1

I didn't try enough examples to be certain - but for the most part, LAME usually doesn't generate an error, but instead just truncates or rounds the argument (so 16.25 becomes 16); or maybe switches to the default. So command lines like this:

-V 1 --vbr-new --lowpass 16.25 --lowpass-width 1

seem to just give the same result as command lines like this:

-V 1 --vbr-new --lowpass 16 --lowpass-width 1
And for some reason this command line

-V 1 --vbr-new --lowpass 16 --lowpass-width .5

seems to give the same result as this one:

-V 1 --vbr-new --lowpass 16 --lowpass-width 1
But since it's not one I'd be likely to use, I didn't spend time trying to figure that out any more.

The main point being - don't assume LAME will give you an error if you use the wrong inputs for the switches, at least for lowpass and lowpass-width - most of the time it will go ahead and encode and give you something other than what you thought you were getting.

If anybody thinks there's something wrong, let me know. Next I'm going to look at the frequency graphs to see if the look the way they should.
uart
Interesting. Tell me buzzy, how did you measure the "lowpass-width" to tell that those last two were the same? Or did you mean that they were litereally the same in terms of a binary file compare?
Rio
I strongly suggest simply use -V5, it inherently cuts off at 16khz

If you can ABX -V5 vs. -V2 --lowpass 16000 and convince yourself that -V2 sounds better, then go with -V2. Otherwise, encode at -V5.

Cheers! cool.gif

Edit: -V2 or as you said, -V1
Porcupine
I haven't tested V5 to see if it truly cuts off at 16 kHz, but you're probably right (it also might vary a little depending on LAME version). But I wouldn't recommend using -V5 just for that effect. Because -V5 will still be of a lower quality in other aspects also. There are times where -V5 and even -V2 is not transparent, someone just posted a sample recently on the mp3 General forum (I'm not encouraging anyone to look for that sample, I'm just giving it as an example so people don't criticize me for saying things without proof).

buzzy, yeah you are right, LAME rounds off your --lowpass ##### setting. It only has a few very rough increments, roughly in increments of approx 500 Hz. Just make sure you check what the encoding screen output says is the range it is using. I haven't played with the width settings so I dunno how exact those are either, you'll have to figure it out yourself sorry.

And one last warning, as you've guessed it's probably a good idea not to assume that LAME is doing what it says it is doing. Even if it prints out the lowpass freq and width for you, if you can check it with a separate spectrograph that is better. I think LAME may be accurate in this case, but there's been a lot of times where I discovered LAME doesn't do what it says it is doing (switches not working, behind-the-scenes algorithms forcefully activated/deactivated, etc, even when the --verbose output says they are on/off or doesn't report them). This is all chaotic and depends on LAME version, etc. Bottom line, test yourself if you can. sad.gif LAME gives me a headache with the way it behaves sometimes.
Rio
QUOTE
LAME 3.97 32bits (http://www.mp3dev.org/)
CPU features: MMX (ASM used), SSE (ASM used)
Using polyphase lowpass filter, transition band: 15826 Hz - 16360 Hz
Encoding C:\Documents and Settings\Rio\My Documents\My Music\temp.wav
to C:\Documents and Settings\Rio\My Documents\My Music\temp.mp3
Encoding as 44.1 kHz VBR(q=5) j-stereo MPEG-1 Layer III (ca. 11.9x) qval=3
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
7900/9030 (87%)| 1:06/ 1:16| 1:06/ 1:16| 3.0803x| 0:09
32 [ 7] *
40 [ 21] *
48 [ 13] *
56 [ 12] *
64 [ 7] *
80 [ 148] %***
96 [1408] %%%****************************
112 [3091] %%%%%%%%%%%%%%******************************************************
128 [1852] %%%%%%%**********************************
160 [1258] %%%%%%%*********************
192 [ 72] %*
224 [ 10] %
256 [ 0]
320 [ 1] *
-----------------------------------------------------------------00:29---------
kbps LR MS % long switch short %
120.3 16.9 83.1 98.2 1.1 0.7


LAME 3.97 @ -V5 applies polyphase lowpass filter, transition band: 15826 Hz - 16360 Hz.
EncSpot reports lowpass filter of 16000 Hz.

But when encoding using the presets according to your lowpass preferences as stated in the LAME wiki: http://wiki.hydrogenaudio.org/index.php?title=LAME
-V5's lowpass 16538 Hz - 17071 Hz

I agree with Porcupine regarding *un*documentation on fixes of previous LAME builds. Even the wiki needs to be fixed.

At any rate, ABX yourself. cool.gif
buzzy
QUOTE(uart @ May 20 2007, 00:56) *

Interesting. Tell me buzzy, how did you measure the "lowpass-width" to tell that those last two were the same? Or did you mean that they were litereally the same in terms of a binary file compare?
To be honest, I just looked at the byte size, as the odds of it being exactly the same seemed very unlikely given the complexity of an encode. But I just did a file compare, too, to confirm that they were identical. The key point, again, is that it's not clear what arguments these switches can take, so it's good to test them to be sure they are doing what was intended.


UPDATE - I've had to revise what I posted here, this is my current guess (!):


--lowpass-width switch syntax/arguments

It seems that lowpass-width might interpret inputs as a % of the lowpass frequency, rather than as a khz number. For example, --lowpass-width 10 creates a width of 10% of the lowpass frequency. (So some of the seemingly unusual things noted above in this thread are understandable, such as why lowpass-widths of .5 or 1 get rounded to the same result because it's interpreted as 1%.)

It seems to accept various integer values - I've tried it with 1, 2, 3, 5, 10, 15, and 20, and the resulting variations in file size seem to indicate that it's working as expected. Not everything worked - 25 seemed to give the same result as 20 for lowpass 19. I didn't try any decimals (such as 7.5), no real point in that.

This part of the longhelp is a little misleading, if you ask me:

--lowpass-width <freq> frequency(kHz) - default 15% of lowpass freq
Something more like this would be more useful:

--lowpass-width <number> % of lowpass frequency - default 15 (%)
Though I'm not sure when the default kicks in, and I wouldn't rely on it.

What may be most interesting is that if you use lowpass without the width switch, you get a width of zero, not the supposed 15% default. See the illustration below - the no width example.

How --lowpass-width works (?!)

Lowpath width seems to create a band below the frequency set with the --lowpass switch, over which the sound is phased in.

It seems to provide a way to have a less dramatic cutoff than the typical lowpass would have.

So, for example, instead of having a sharp cutoff at 16khz, you might use a lowpass of 17khz and a lowpass-width of 6% to have something of a phased transition from 16 to 17.

See the illustrations below.

How it's best used in practice is an open question.

Illustrations

Here's an illustration of an original wav file and several decoded wav files from various encodes. The encoded files were -V 0 --vbr-new --lowpass 19 encodes (the high lowpass of that preset gives a little more room to play around with the spectral graphs). The various graphs show the same four seconds of music for:

- the original wav file
- a -V 0 --lowpass 19 encode with no --lowpass-width - it's a fairly sharp cutoff
- the same with --lowpass-width of 1 - not much effect, though maybe the surprise is that there's clearly some effect even at such a low number
- --lowpass-width of 10
- --lowpass-width of 20

IPB Image


A couple things to note -

- the peaks in the lower graph do correspond to the peaks in the original, which seems intuitively a good thing
- as you might guess, the corresponding file sizes get smaller as you move down the graph.

Again, the most surprising thing perhaps is that it seems that when the lowpass switch is used without the width switch, there is no width, not the supposed default of 15%.

The other curious thing that seems to be happening is that it's affecting the frequencies well below where it should - compare the bottom 2-3 graphs in the lower part of what's shown. For example, if the width is a %, even at 20% x 19 = 3.8 khz - then you'd think it wouldn't be affecting the frequencies below 15k at the left side - but it seems to be.

My first reaction is - Why not just document this a little bit and save everybody a lot of time? My second reaction is - given that it's not clear enough what lowpass-width is doing, it may be something that's not very usable in practice.

I'll leave more detailed interpretation for later posts or for others to add.
buzzy
QUOTE(Porcupine @ May 20 2007, 04:29) *
And one last warning, as you've guessed it's probably a good idea not to assume that LAME is doing what it says it is doing. Even if it prints out the lowpass freq and width for you, if you can check it with a separate spectrograph that is better. I think LAME may be accurate in this case, but there's been a lot of times where I discovered LAME doesn't do what it says it is doing (switches not working, behind-the-scenes algorithms forcefully activated/deactivated, etc, even when the --verbose output says they are on/off or doesn't report them). This is all chaotic and depends on LAME version, etc. Bottom line, test yourself if you can.
Very true, and I'm quoting it again to make sure word gets around.

I don't view that as a knock on the developers. It's more a matter than LAME seems so finished and elegant from the outside, so it's easy to assume all the many switches work, in a useful way. But people need to keep in mind that (like all software) is written to do what it's tested to do, not to be perfect in every possible combination of inputs.

As far as documentation - I was definitely getting that feeling when I discovered that the only place the Y switch seems to be documented is in the LAME-generated longhelp.

Rio - based on the fact that even with --lowpass 16, file sizes get smaller and smaller as you move from V0 on down - LAME does seem to be taking out bits below 16khz. While it might be transparent down to some level, what I'm looking for is something that I could safely use for home audio too.

So I'm guessing I'll end up with something like V1 or V2 with a lowpass at 16.5 or 17 and a width of 5 or so.

But I wanted to narrow down the options some before I started ABXing, to just the options I'm thinking about using, or even just the originals and the proposed settings; I think I've saved myself some time by doing that.
buzzy
OK, well, one more thing on the lowpass-width ... here's a chart showing the relative file sizes. This only used one file, but I think it's indicative enough to post.

Notice that even with very high widths, the bitrate doesn't go down dramatically in this example. (The curve might be different at different lowpass numbers - this used 19 - but I'd still expect similar general behavior.)

IPB Image


So, is lowpass-width expecting a khz number, or a % of the lowpass?

Things that suggest a %:

- The effect on the bitrate seems small for what you'd expect at even 1 or 2 khz.
- Numbers that would be huge as a khz number, like 15 or 20, don't have that much effect on the bitrate.

Things that suggest a khz number:

- It seems to affect frequencies below the expected % range

Other observations would be welcome.


Anyway, the only thing I have concluded with any certainty: lowpass-width is too unpredictable and unknown to use, so I'm just going to try some straight lowpass encodes for what I need.
pdq
It would not be surprising if changing the filtering above a certain frequency affected frequencies below that frequency. As you reduce the amplitude of the higher frequencies, they provide less masking of the lower frequencies, which are then more likely to be encoded.
Rio
QUOTE(buzzy @ May 15 2007, 02:31) *

Thanks. All true, but based on what people have said, it seems like using v1 or v2 and a lowpass might give better perceived quality in less space.


I just did a simple test, encoding the same track with different settings:
  • -V2
  • -V2 --lowpass 16000
The first file yielded an average bitrate of 191, while the second file yielded 187, shaving off 4 kbps from the lowpass. Whether the saved bits would provide extra quality encoding to the sub-16khz frequencies, I have yet to know (I could assume it would, but is it audible at all?)

My next best bet is that you ABX two -V2 encoded files, the one at normal setting and the other with --lowpass 16000. If you can't hear the difference, then continue with the lowpass for your next encodings. If you can hear the diff, then REJOICE! Your ears aren't really that old at all! cool.gif

EDIT: Ok, this is just 1 track...
Porcupine
I just noticed the following...I'll use Rio's convenient encoding log since it's here:

Using polyphase lowpass filter, transition band: 15826 Hz - 16360 Hz

Like I said earlier, the default width of the lowpass filter is generally around 500 Hz. What I never bothered to consider before is what percent is that of the lowpass frequency.

534 Hz / 16093 Hz = 3.3%, not 15%

So I guess maybe this solves the mystery. Yet another example of where the LAME documentation or descriptions are horribly out-of-date, but at least the encoding output seems to be telling the truth in this case.

Looking at buzzy's spectrographs, it's hard to tell if they confirm the default width of 3% or not, but I'd say it's not inconsistent. buzzy thought it was 0% width, but it's hard to tell from looking at that. Also as pdq said, there can be secondary effects which complicate the analysis when trying to see the difference between 1% or 2%.

Regarding the shrinkage of overall filesize by a marginal amount, it doesn't surprise me. The higher "freqs" in general are encoded at a much lower resolution (very quantized) compared to the middle and low freqs. In separate tests, I crudely estimated that a given high freq only uses 20% (one fifth) of the amount of data a "middle" freq (such as 400 Hz to 1 kHz) would use. This was for encoding at the highest quality settings, where the highs are relatively unquantized (at lower quality settings, the highs get sacrificed much more, their quantization algorithm changes and their quantization goes up by 2x, 4x, or more...even if you don't apply a lowpass).

Therefore if you lowpass all freqs above 16 kHz, you might only save 6 kHz / 22 kHz / 5 (extra quantization) = 5% of all your bits. This amount isn't insignificant, but it's not as huge as some people might expect. This is one reason why I personally encode everything with -k (no filters). I am only using a little bit of bits for the highs anyway, so I encode them whether I think I can hear them or not. But that's my own encoding preferences.

My calculation seems roughly consistent with buzzy's graph. As he increased his lowpass width by 20% of 16 kHz (about an extra 3 kHz) or so, he saved an extra 8% of bits (you start to save more and more the lower you go because the lower freqs are less quantized and take up more bits). If a lowpass width of 100% were accepted you'd be left with no sound and 0 kbps file, as long as the curve seems headed in that direction everything is fine. Hrmm, upon closer inspection that doesn't seem to be the case though, oh well! smile.gif Curve should be curving the other way, maybe if you did more points it would start to curve that way.

QUOTE(Rio @ May 21 2007, 20:35) *
Whether the saved bits would provide extra quality encoding to the sub-16khz frequencies, I have yet to know
I dunno either. In theory a perfect VBR should output a 16 lowpassed file of the same level of sound quality as the non-lowpassed version, just smaller in size (and lowpassed). But VBR is not guaranteed to be truly perfect.

Either way though to me it's the same thing. If the file size is smaller and the quality is the same, isn't that good too? If you still want better quality then you can increase your V setting, therefore increasing the quality and maybe getting a file the same size as you used to (before applying lowpass).
[JAZ]
QUOTE(Porcupine @ May 22 2007, 04:02) *

Therefore if you lowpass all freqs above 16 kHz, you might only save 6 kHz / 22 kHz / 5 (extra quantization) = 5% of all your bits. This amount isn't insignificant, but it's not as huge as some people might expect. This is one reason why I personally encode everything with -k (no filters). I am only using a little bit of bits for the highs anyway, so I encode them whether I think I can hear them or not. But that's my own encoding preferences.


Did you heard about sfb21 problems, bitrate bloat with highs, and what-not? Just to give you a quick point: encoding in the sfb21 (which starts at 16Khz precisely), can make *all* bands use more bitrate, not just this band.
Porcupine
I have heard about this issue but none of the descriptions I could ever find regarding it were intelligibly written. If you can point me to a reputable, well-written description of the issue I would be very appreciative.

> Just to give you a quick point: encoding in the sfb21 (which starts at 16Khz precisely), can make *all* bands use more bitrate, not just this band.

The way you've worded this, it doesn't mean anything to CBR. It could only affect VBR.

There's no way encoding more frequencies above 16 kHz could force a CBR file to suddenly use more bitrate. The bitrate is set with CBR. The only thing that the high frequencies could do is steal some bitrate from the frequencies below 16 kHz, which is of course what happens. According to my tests though (and buzzy's also, I think), the amount stolen is very small. High freqs use only about 1/5th of the bitrate an average middle frequency such as 1 kHz uses.

Regarding VBR, sure anything could happen. If what you say is true, then encoding frequencies above 16 kHz suddenly makes the VBR algorithm confused and the whole file (all frequency bands) bloats up in size and uses way more bitrate. Your file though should have improved sound quality because of this, too. So even then it's not a real problem, it's just the VBR intelligence of LAME being stupid (if what you say is true).

On the other hand, if you meant to say "wasting" bitrate, rather than "using" bitrate, that's a different story. A clear description of this "problem" or reputable reference would be appreciated. But even if LAME were suddenly to "waste" bits when told to encode high freqs, that might be a flaw in the LAME encoder, not a flaw in the mp3 codec itself.
pdq
This is not a LAME problem but rather a design flaw of MP3 itself. My understanding (which is very limited) is that since there is not a separate gain term for frequencies above 16 kHz (as there is for all other frequency regions), the only way to adjust the gain when trying to reproduce high frequencies is to adjust the global gain, which changes the gain for ALL frequencies, even those for which this will just be wasting space. The degree of waste is very dependent on the source material so don't expect to try one or two tracks and draw conclusions from that.

The alternative is to not change the global gain, which means that the high frequencies will not get the bits that they need but the rest of the bands will get the proper amounts. This is what the -Y switch does.
buzzy
QUOTE(Rio @ May 21 2007, 22:35) *
I just did a simple test, encoding the same track with different settings:
  • -V2
  • -V2 --lowpass 16000
The first file yielded an average bitrate of 191, while the second file yielded 187, shaving off 4 kbps from the lowpass. Whether the saved bits would provide extra quality encoding to the sub-16khz frequencies, I have yet to know (I could assume it would, but is it audible at all?)
I'll come back and read the other comments more carefully - but before we get too far down this track, I think this is just the effects of a typo in how you entered it.

LAME is expecting a khz number, so if you literally used lowpass 16000, that's 16000 khz. (So, more pure LAME quirkiness that it even dropped 4 kbps.)

I did a quick test track and got

V 2 - 194 kbps
V 2 --lowpass 16 - 166 kbps

So a big difference, about 15%, which I would have expected given that the lowpass in the wiki is well over 16 khz - 18671 Hz - 19205 Hz for V 2
Porcupine
What version of LAME are you using, buzzy?

The different versions may want to take in different inputs for the --lowpass switch. I currently use LAME 3.95 (will switch to 3.92 soon) and my LAME wants lowpass values in Hz....such as --lowpass 16000. But I think it depends on version so I never said anything before.

QUOTE(pdq @ May 22 2007, 18:29) *
This is not a LAME problem but rather a design flaw of MP3 itself. My understanding (which is very limited) is that since there is not a separate gain term for frequencies above 16 kHz, the only way to adjust the gain when trying to reproduce high frequencies is to adjust the global gain
Thanks for the description, pdq. I'd already heard this exact same thing before as well. Again, to me this description is not in-depth and reputable enough to prove to me that this problem exists. And I've seen no evidence of any problems like this when looking at the MDCT quantization levels in the mp3s I've encoded.

First off, I might be inclined to think it foolish to have a gain value for each scalefactor band, plus one global gain value. This is redundant by one gain value. I see no need for a global gain value if there were a gain value for each scalefactor band. And if there is a global gain value, I would indeed take away a gain value for one scalefactor band (doesn't matter which one) so that I don't store redundant information in my encoded files.

If you wish to scale the data in SFB21, simply adjust the global gain value then re-adjust the 20 (or however many there are) remaining gain values for each scalefactor band as necessary. What is so hard about that? If you wish to prove that there is an issue, you must at the very least provide detailed information on the bit-depth of the scalefactors (are they 8-bit integers? 16-bit integers? 24-bit integer/floating-point? etc). You might also need to prove detailed information on other things as well, depending on how your argument goes. So far the argument given does not prove anything.


pdq
As I said, my understanding is very limted. About all that I can add is that the gain values are 8 bits and logarithmic, with a resolution of about 1.5 db per step.
Rio
I must say, I haven't really consulted the LAME --longhelp in applying the lowpass filter. I just used Porcupine's earlier setting. However, LAME 3.97 would still yield the same file, regardless of --lowpass 16 or 16000 (as confirmed by another simple encoding using such settings and as reported by EncSpot of lowpass filter 16000.) LAME --longhelp stated that the lowpass is in khz though (I couldn't blame the devs about the documentation, though.) "Hey, it's a free program! Why complain?"

It's just my track's idiosyncracy that LAME was just able to shave off 4kbps using the lowpass. At least you were able to save 15% space, a very significant savings.

At any rate, I think we are getting there to your encoding dilemma. I hope we were able to help you.

Cheers! cool.gif
buzzy
QUOTE(Porcupine @ May 22 2007, 21:24) *

What version of LAME are you using, buzzy?

The different versions may want to take in different inputs for the --lowpass switch. I currently use LAME 3.95 (will switch to 3.92 soon) and my LAME wants lowpass values in Hz....such as --lowpass 16000. But I think it depends on version so I never said anything before.
I'm using 3.97. So apparently the inputs could be different with different versions! Thanks for the heads up.

In any case, there have to be significant bitrate savings in shaving 2-3 khz off the lowpass vs. the presets.
QUOTE(Porcupine @ May 21 2007, 23:02) *
I just noticed the following...I'll use Rio's convenient encoding log since it's here:

Using polyphase lowpass filter, transition band: 15826 Hz - 16360 Hz

Like I said earlier, the default width of the lowpass filter is generally around 500 Hz.
I'm not sure the presets use the defaults, though. And is it clear that the transition band is the same thing as lowpass width?
QUOTE
What I never bothered to consider before is what percent is that of the lowpass frequency.

534 Hz / 16093 Hz = 3.3%, not 15%

So I guess maybe this solves the mystery. Yet another example of where the LAME documentation or descriptions are horribly out-of-date, but at least the encoding output seems to be telling the truth in this case.

Looking at buzzy's spectrographs, it's hard to tell if they confirm the default width of 3% or not, but I'd say it's not inconsistent. buzzy thought it was 0% width, but it's hard to tell from looking at that. Also as pdq said, there can be secondary effects which complicate the analysis when trying to see the difference between 1% or 2%.
There's definitely less width in the no width example than the 1 example, for what that's worth. Look at the section at the far right, for example. Having looked at a hundreds of these graphs over the years, that difference is meaningful. There's definitely a little less energy in the 1 than in the no width. So I don't know that the lowpass default uses 3%, either.

Just to be precise, though, no width meant no --lowpass-width setting was used. It isn't necessarily the case that there is 0 width.
[JAZ]
QUOTE(Porcupine @ May 23 2007, 02:24) *

First off, I might be inclined to think it foolish to have a gain value for each scalefactor band, plus one global gain value. This is redundant by one gain value. I see no need for a global gain value if there were a gain value for each scalefactor band. And if there is a global gain value, I would indeed take away a gain value for one scalefactor band (doesn't matter which one) so that I don't store redundant information in my encoded files.


From : http://wiki.hydrogenaudio.org/index.php?title=Scale_factor
QUOTE

Thus they [scalefactor] implicitly modify the bit-allocation over frequency since higher spectral values usually need more bits to be coded afterwards.

Whenever a scalefactor band is amplified, it will force the next quantization to use more bits for that band. This will result in more bits used to encode the MDCT coefficients in that band, and thus less quantization error. That is why bands with audible distortion are amplified. However, it will also result in less bits for the unamplified bands.


From http://www.mp3dev.org/ (MP3 -> MP3 Limitations)
QUOTE

To increase sfb21 resolution, the global gain value has to be reduced. To balance this, scalefactors of other scalefactor bands can be reduced. But once they reach a value of 0, they can not be reduced anymore, meaning that an higher than needed resolution will locally be used in those bands, leading to an inflate of the bitrate.


You can think that these links are not reputable (that's your option), but I think they say clearly enough that you have to either use more bits, or have a worse quality. In fact, *you* even say it.
Take your option.

For the sake of completeness :

LAME -V2 usually leads to ~200kbps with usual material. With Metal Tracks is said to average 250kbps. That's a woooping 25%

Porcupine
The size ranges for LAME VBRs, for a given V setting and other parameters, also depend super-humongously on the LAME version you've used. I'm not saying your ranges are wrong, just a heads-up and cautionary warning, in case people notice discrepancies.

Also, just wanted to point out that the difference in size between difficult-to-encode metal tracks and "typical material" is not necessarily of any direct relation to the difference in size between lowpassed and non-lowpassed VBRs.
QUOTE
To increase sfb21 resolution, the global gain value has to be reduced. To balance this, scalefactors of other scalefactor bands can be reduced. But once they reach a value of 0, they can not be reduced anymore...
That link is laughably unreputable. It's not because of the site it is from, I am judging merely by what was written. Just read that. It is not even logical.

The global gain has to be *reduced*...then balanced out by *reducing* the scalefactors of the other bands as well? One of those is supposed to be *increased*, I dunno which one because this entire argument makes no sense. And a scalefactor reaching a value of 0? That means that there is no sound, that is stupid.

From the hydrogenaudio wiki (which has been proven to contain numerous errors in the past, but oh well) link you gave:
QUOTE
In Mpeg layer3 the global gain defines the largest stepsize to use. The scalefactors are used to reduce the stepsizes for the special needs of the scalefactor bands.
This sounds great to me, no arguments here. So why is this is a problem? This is exactly how things are supposed to be. The global gain, which in some sense corresponds to the SFB21 gain (which is "missing"), defines the largest stepsize to use. That is how it is supposed to be because SFB21 (the ultra-highs) should have the largest quantization of all the freqs. After that, the 20/21 remaining scalefactors are used to reduce the stepsizes for the other bands, it says. Yeah, that's how it should go, the lower freqs need a smaller scalefactor so that they can be encoded more carefully. Where is the problem with that? There's no flaw of the codec that I can see.

The second half of the hydrogenaudio wiki link you gave though, does not seem logically consistent with what was written in the top half I quoted. Also numerous things were written which are blatantly wrong, I think, but oh well. In any case, these arguments don't convince me, a more reputable source (or at the very least, one that is self-consistent and logically written) is needed.

EDIT: buzzy, hrm yeah I'm not sure about the 3%, 1%, vs no-width specified thing. I still think your graphs may be inconclusive in that subtle comparison (as pdq mentioned, secondary effects can complicate things). I'm not saying your conclusion is wrong. I just don't know...it's interesting, I hope you manage to get to the bottom of it.
halb27
The outcome from reading something does not only depend on whether it's written well and correctly but also on our own background for understanding things (technical stuff can rarely be written in a self-contained style).

As for the mp3 restrictions on scalefactor handling and potential resulting sfb21 bloat I personally don't know the details. But I take it as a fact - I consider the chance/risk relation for this hypothesis to be wrong is so bad that I wouldn't make any effort to prove it wrong. There is no use to me to learn about scalefactor details, and no reason to disbeleive in these commonly cited mp3 restrictions.
robert
Ok here a simplified description of the sfb21 problem, for a more detailed explanation read the ISO docs or the LAME sources:

for long blocks:
a global gain, an 8 bit value: range 0-255
scalefactor bands 0-15, a 4 bit value: range 0-15
scalefactor bands 16-20, a 3 bit value: range 0-7
scalefactor band 21, 0 bits: range 0-0

oversimplified: the resulting step size for each scalefactor band:
stepsize[i] = global gain - 210 - 2 * scalefactor[ i ]
as there is no scalefactor for sfb21:
stepsize[21] = global gain -210

now: let your psymodel say you need a smaller step size for sfb21, then you can see, every other scalefactor band will have to use a stepsize less or equal to the one of the sfb21.
As long as the demand for sfb21 is the largest one, there is not much bitrate bloating, bloating starts, when sfb21 demands a smaller stepsize than one of the other bands demands.

Btw, for short blocks there is a sfb12 problem like the one above.

This is a design flaw of MPEG Layer3, not LAME specific. It seems to me, the Inventors of Layer3 had only samples with a high frequency boost to play with.
Porcupine
QUOTE(robert @ May 24 2007, 03:24) *
bloating starts, when sfb21 demands a smaller stepsize than one of the other bands demands.
I see. Thank you VERY MUCH for such a clear and concise explanation of the issue. Now, the earlier quote from mp3dev.org makes sense too (before it made no sense to me because the definitions of the scalefactors weren't written). Again, thank you very much.

QUOTE
It seems to me, the Inventors of Layer3 had only samples with a high frequency boost to play with.
I see what you mean. But, maybe the developers of Layer 3 used such quirky definitions for the scalefactors and step sizes, to conserve header/sideinfo space for each frame. The scalefactor info appears to take 11 bytes to store, maybe 22 bytes for both channels (not sure), this is not insignificant compared to the total framesize of a "typical" 160 kbps mp3 (slightly over 511 bytes, I think).

It looks to me like the quirky definitions and bit-depths of the various scalefactors were carefully crafted by the designers of Layer 3 to try to provide maximum flexibility with a minimal amount of bytes needed to store the scalefactors. Besides the phenomena of bloating you just described, thanks to your definitions I see that other problems could possibly arise as well with those weird scalefactor definitions. If the high-freqs were super loud, then possibly the global gain could become too large, and scalefactor bands 0-20 might not have enough bit-depth to have the required small stepsize (the opposite problem of the "bloating", in a sense).

---------------

To the others/everyone, now I see that the problem of bitrate bloating is real. But is this a reason to choose not to encode any freqs above 16 kHz (via lowpass filter)? I often see that given as a reason, but I still don't know if I can agree with that. I think you may just need to do the best you can, given the limitations of the codec.

Note to buzzy: I'm not criticizing your desire to lowpass at 16 kHz, your reasons are different.
halb27
People who don't like the idea of giving away something from the orginal music may want to keep the highest frequencies to the utmost extent. These people usually are willing to pay the price and accept the need for higher bitrate. Everything's fine. That's what you do, Porcupine.

People who realize by listening tests that the musical content above 18 or 17 or 16 kHz (whatever individually is appropriate) is not of any real significance to them may prefer lowpassing as this is improving encoding efficiency (especially with mp3) and/or overall quality. With mp3 a rather low lowpass of 16 kHz or similar is most welcome with respect to the sfb21 issue of course in case that has no real impact on enjoying music (which seems to be the case for most people with regard to the last 128 kbps test where Lame 3.97 -V5 with its 16 kHz lowpassing came out great).
buzzy
So, what I decided to do was to just use -V2 --lowpass 16, although whether it saves much space will depend on the kind of music you're encoding.

As a limited sample, I encoded several types of music with and without the lowpass switch, using both V1 and V2:

CODE
                  U2       JS      Iron     Norah
                           Bach   & Wine    Jones
Bitrates
     v1           218      212     220       190
     v1-16        185      210     206       178
     v2           188      189     190       163
     v2-16        161      187     181       155

Filesize comparison
     v1-16        85%      99%      94%      94%
     v2-16        85%      99%      95%      95%


The columns are the different CDs that were encoded:

U2 - Greatest Hits 1990-2000
JS Bach - Violin Concertos
Iron & Wine - Our Endless Numbered Days
Norah Jones - Not Too Late

The U2 saw the largest reduction (15%), the classical the smallest (1%). The two vocal & instrumental albums saw modest 5% reductions in bit rates. And of course the U2 was the one I was using for my initial trials, so it made the potential savings look larger than they really would be for a range of music.

Probably not worth fooling around with for most people, and it all just reminded me that getting into the innards of LAME is usually a waste of time, as well as turning up a lot of stuff you'd rather not know - about how only the core functions work well, and almost no functions are well documented.

I can see myself encoding it all to flac at some point, too, so I decided to go with V2 rather than V1.

This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.