Help - Search - Members - Calendar
Full Version: MPC VBR flaws (low volume & ringing)
Hydrogenaudio Forums > Lossy Audio Compression > MPC
Pages: 1, 2, 3
guruboolez
I’ve read recently some complaints about musepack and distortions occurring with classical music (examples here and here). There were no ABX tests to confirm them. According to my previous listening tests at ~175 kbps, musepack performs not only very well with various kinds of instrumental and vocal samples, but also better than competitors. But I’ve also noticed in the past one issue with this audio format that my previous test didn’t revealed, and it’s a very big one. I’d like to bring out this problem to the community, which wasn’t as far as I know warned about this kind of flaw.

Before carrying and before some seeing zealous users bare its teeth, I have to make clear that this issue only occurs in specific conditions. The problem is confined to low-volume musical content, and is mainly audible when this content has to be listened to a higher playback volume. In other words, affected tracks must have low volume parts, and tracks with high dynamic are not really concerned (you can’t constantly push the volume on such material: your neighbors won’t appreciate it). The problem becomes really critical with low-volume tracks only. People who have to live with the consequences of the “loudness war” are certainly not used to encounter such tracks, but for classical fans, tracks that are replaygained at +10 dB, +20 dB and sometimes +30 dB are all except a rare thing (tracks with corrected gain beyond +25 dB are nevertheless very rare). The encoded material would exhibit strong artifacts with ReplayGan set with Track Mode (they won’t be audible otherwise, except maybe as a subtle form of distortion – it could explain some recent complains about musepack and classical music). With RG enabled, even untrained people will be shocked by the terrible ringing that run across this musical material. MPC, with --standard profile, and to some degree --extreme and also –insane is apparently not sensitive enough to handle low volume situation.


At this stage of my account, some people would be probably tempted to claim that such issue is normal with perceptual encoding, and that all other formats will suffer from the same issue in this specific playback condition. But a quick comparison would immediately deny all validity to this idea. I’ve compared musepack --standard to comparable MP3, AAC and Vorbis presets, and these competitors showed the ability to encode properly (no ringing, flat lowpass at high level) the same material. Even stranger, MP3 at 128 kbps, or Vorbis at 90 kbps (!), or AAC (faac!) at 100 kbps perform *much* better than musepack --standard. In other words, perceptual encoders (at least modern one) could handle this situation transparently at mid/low bitrate, even with VBR; only musepack fails, and badly. It might be interesting to note that the VBR model is apparently flawed: with --standard, the bitrate drops to unusual value (110…140 kbps), and quality to an even more abnormal threshold. An illustration (graphical – listening tests were performed upstream - click for link) could make things easier to understand:



I’ve also uploaded an additional gallery - the last one looks very weird! and sounds even worse as it looks.


The ringing, and the austere lowpass, are obvious on these screenshots. Quality is objectively worse than MP3@128; subjectively speaking, the audibility is –as usual- linked to various conditions: hardware, player settings (RG or not), listener’ sensitivity to ringing. Some users won’t notice it, some others will be frightened. The important point to note here is that other audio formats have no problems; my purpose wasn’t to make an infertile comparison between MPC and other. Based on this comparison, I’m tempted to say that MPC could rejoin them with some tuning. Anecdotal point: LAME had recently serious issue (which also concern 3.90.3 ABR at mid/low bitrate) and they were recently solved by developers. I think Gabriel worked on an adaptive ATH threshold, and it might be a lead for MPC developer or for some users which are interested to play with current encoder switches.


I’ve uploaded some samples. The gain for short samples is necessary different from the gain of complete sample; but I’ve tried to cut sample with similar gain. The WavPack samples uploaded have all the native gain and the track_peak of the full track. I’ve also duplicated the track gain to the album gain.

http://guruboolez.free.fr/MPC/quiet_tracks_replaygained.zip

Two appendix in this zip file : a piano sample for which track gain for the sample doesn’t really match to the track gain of the full track (+40 dB instead of +25 dB) ; and a very noisy track for which musepack doesn’t have any problem, despite of high gain correction.


This report is probably the last one I’ll do for MPC (a developer have claim their lack of interest for improving classical at --standard), but I nevertheless hope it will help to improve the encoder. Playing with command line (in order to change ATH or noise sensitivity) might be enough to solve or reduce this issue; therefore, every MPC user could contribute. In the meantime, users should be aware of this issue.
shadowking
I confirm serious problems under these special listening conditions.

Thanks you for making it clearer for me. Now I understand it better!
Acid8000
What I understand from your post guru is that at > insane, this effect isn't significant. I hope this is the case.

Edit: Whoa, this is only noticable with ReplayGain higher than +20dB or so. The lowest classical on my system is +3dB or so. Any rock/metal/rap/electronica is gained to -9dB sometimes. Looks like I shouldn't really be concerned. Phew. smile.gif
rjamorim
I think you should put more creativity into your report. Maybe writing it entirely in haiku, and adding pictures of women in bikini next to the spectrograms tongue.gif


Hehe. Anyway, thank-you very much for this very enlightening report, Guru.
Gambit
I haven't seen this mentioned anywhere, so I thought I'll quote Case from IRC:
QUOTE
<cse> btw, here's something Klemm says about encoding highly dynamic movie tracks. I think this is valid for classical too:
<cse> 3. Also I suggest to use the option
<cse>        --standard --ath_gain -14
<cse>    with movie soundtracks.
<cse>    For other quality settings
<cse>      --quality x --ath_gain 16-6*x
<cse>    This lowers ATH by 14 dB relative to the standard.
<cse>    Feeding of mppenc should be done with 24 bit when possible.
<cse>    Use a 24 bit AC3decoder.
Gabriel
The obvious workaround is to check the track gain before encoding, then adjust the ath level according to the gain.

The fix for the encoder would be to adjust dynamically its ath level. For a vbr encoder it is very important as you can not rely on the target bitrate "safeguard" as in cbr.
Lefungus
QUOTE (guruboolez @ Jun 23 2005, 11:22 AM)
This report is probably the last one I’ll do for MPC (a developer have claim their lack of interest for improving classical at --standard), but I nevertheless hope it will help to improve the encoder.


I'd like to know where you got that claim smile.gif
To put things clear, the encoder side is unmaintained. I'm not aware of anyone actually trying to improve it, or increase its transparency even for non-classical music. (feel free to prove me wrong)
Recent releases were bugfixes or nice additions, but no changes were made to the psy-model itself. Some extremely minor patches for tag writing will come soon and that will be all.
So unless Klemm give some input, nothing will come out off this stuff. The codec is just fading away, losing its relevance little by little (and all mpc haters drop tears of joy and happiness).
mtm
guruboolez, thank you very much for your input. I think your findings are very valuable.

I downloaded your sample set and tried to test according to Gambit's (Case's ? Frank's ? wink.gif) suggestions, but it seems my hearing will need a few more days of peace after I went to The Mars Volta concert on Tuesday. sad.gif
Dibrom
QUOTE (Lefungus @ Jun 23 2005, 11:33 AM)
QUOTE (guruboolez @ Jun 23 2005, 11:22 AM)
This report is probably the last one I’ll do for MPC (a developer have claim their lack of interest for improving classical at --standard), but I nevertheless hope it will help to improve the encoder.

I'm not aware of anyone actually trying to improve it, or increase its transparency even for non-classical music. (feel free to prove me wrong)
Recent releases were bugfixes or nice additions, but no changes were made to the psy-model itself.


Well, I've been working on encoder changes, including a complete rewrite, but is has kind of taken a back seat to my player for the moment. I've made many changes to the psymodel code (many functions have been completely rewritten), although they are speed optimization oriented and the output is made to be identical. I haven't gotten far enough to really be concerned with changing things at the quality level yet.

I think that if someone could get feedback from Frank about the best way to handle this (i.e., algorithms, etc.), I could try to implement his ideas since it seems maybe nobody else will.

QUOTE
The codec is just fading away, losing its relevance little by little (and all mpc haters drop tears of joy and happiness).
*


I think a lot of this has to do with a certain lack of visibility. It also seems that many people (not all of course) involved with MPC don't get along too well with HA these days either, which sort of leads to a strained relationship as far as potential developers might be concerned.

I would personally be a bit sad to see MPC fade into irrelevancy because of lack of development since it was the first codec I used that I was really impressed with once I started paying serious attention to encoding quality. For the most part, it's still one of the best too.

@guruboolez: Thanks for the report.
rjamorim
QUOTE (Lefungus @ Jun 23 2005, 04:33 PM)
The codec is just fading away, losing its relevance little by little (and all mpc haters drop tears of joy and happiness).
*


I honestly don't see that as a surprise (but that's maybe because I would probably fall into your definition of "mpc hater")

The codec's biggest seling points were always quality and speed. While these features really set MPC apart during its heyday (2000~2002), nowadays the distinction with other codecs isn't that obvious. In 2001 Vorbis was still at its release candidates, and was slow. We had no Nero or iTunes, so the only option for us AAC lovers was the painfully slow Psytel. Lame represented a format that was probably already at the end of its improvement potential, and was quite slow as well.

Since then, Vorbis reached 1.0 and later 1.1. Quality improved a lot, and Lancer showed us you can have very, very fast Vorbis encoding with minimal quality tradeoffs. iTunes and Nero AAC were released, bringing AAC quality to a whole new level and making encoding much faster while at it. And the Lame developers seem set on amazing us with each new release, pulling MP3's quality much beyond what everybody thought would be the limit. And, of course, speed improved a lot there as well.

Now that the other formats are managing to catch up with MPC in its selling points, its limitations are starting to become evident, as the advantages no longer make up for them. Lack of hardware support, lack of multichannel support, can't be used with movies, can't be split and merged, patenting situation is unclear, development stalled... the list is long.
mtm
QUOTE (Dibrom @ Jun 23 2005, 11:01 PM)
I would personally be a bit sad to see MPC fade into irrelevancy because of lack of development since it was the first codec I used that I was really impressed with once I started paying serious attention to encoding quality.  For the most part, it's still one of the best too.
*
I think, quite a lot of people would... It was exactly the same with me and I still use it exclusively.

QUOTE (Dibrom @ Jun 23 2005, 11:01 PM)
I think that if someone could get feedback from Frank about the best way to handle this (i.e., algorithms, etc.), I could try to implement his ideas since it seems maybe nobody else will.
*
Thank you, Dibrom. wub.gif Even *if* it doesn't work out.


[ Off topic: where's that darn "beer" emoticon ? wink.gif ]
CiTay
Thanks again for that summary, guruboolez. I already sent an email to Frank Klemm about this yesterday. I'm pretty sure he'll send me some comments about it, but as always, it may take a while.

QUOTE
Off topic: where's that darn "beer" emoticon ?


Ah yes... it got lost during some forum software updates. There, i added it again for you. beer.gif
CiTay
Frank replied from work that he will comment as soon as he has some free time.
mtm
My sincerest thanks to everyone involved.
guruboolez
QUOTE (Gambit @ Jun 23 2005, 03:41 PM)
QUOTE
<cse> btw, here's something Klemm says about encoding highly dynamic movie tracks. I think this is valid for classical too:
<cse> 3. Also I suggest to use the option
<cse>        --standard --ath_gain -14
<cse>    with movie soundtracks.
<cse>    For other quality settings
<cse>      --quality x --ath_gain 16-6*x
<cse>    This lowers ATH by 14 dB relative to the standard.
<cse>    Feeding of mppenc should be done with 24 bit when possible.
<cse>    Use a 24 bit AC3decoder.

*


Thanks for quoting this information.
--ath_gain -14 works very well, and solves all issues (I didn't carefully tried to hear smallest difference). Good new: bitrate inflation is apparently very limited for most tracks (except low volume one of course).

Also interesting to note, applied to --radio profile this additional switch increase the quality by reducing the level of audible artifacts (classical samples). Bitrate nevertheless inflates from 15...20 kbps. But it seems that with classical music --radio --ath_gain -8 performs better than --quality 4.xx (at comparable bitrate). Apparently, musepack suffers from ATH issues at inferior profile (< --standard), and could maybe benefits from tunings in this area to improve overall quality.
GeSomeone
QUOTE (guruboolez @ Jun 23 2005, 11:22 AM)
This report is probably the last one I’ll do for MPC (a developer have claim their lack of interest for improving classical at --standard), but I nevertheless hope it will help to improve the encoder.

Guruboolez, thanks for the detailed report on this issue in (still) my favorite lossy encoder. Please don't be put off too quickly by the hesitant reaction of "mpc devellopment". We know how the situation is rolleyes.gif.
The "workaround", as posted by Gambit, alone may have been worth it. Maybe this, what looks like a fixable issue, can spark interest of devellopers to have a go at it?

QUOTE (Dibrom @ Jun 23 2005, 11:01 PM)
I've made many changes to the psymodel code, although they are speed optimization oriented and the output is made to be identical.
I'm surprised, I always thought the speed was very good.

/related rant
It was my feeling that Frank Klemm, at the time the last alpha's were released by him, was very concerned about bit rate bloat. Also the ATH's were redefined when introducing the --quality scales. Maybe this issue crept in at the same time (but maybe it was waiting to be brought to front by Guruboolez all the time smile.gif )

BTW Replaygain of +20dB is pretty extreme to me. I'm always cautious with positive RG because it can result in unwanted clipping if there are peaks in the same track/album.
I am aware that in this case it's just an indication when the reported issue occurs. And, apart from RG, someone could just play it very LOUD and maybe notice the same thing.
xmixahlx
dibrom's speed enhancements were focused on PPC/etc AFAIK


later
Dibrom
QUOTE (xmixahlx @ Jun 27 2005, 11:32 AM)
dibrom's speed enhancements were focused on PPC/etc AFAIK
*


A significant portion of them were, but the later changes I've made should improve speed on x86 also, though the gains should be smaller.

A complete rewrite with a more modular codebase could probably allow for a lot more significant optimizations with little hassle (in addition to other pluses like easier maintenance, and an easier transition to a different bitstream like SV8), which is what I was starting on right about the time I shifted to working more on my audio player.

I don't know when I'll release these changes, but input from Frank about possible quality fixes should be pretty independent of most of what I've done so assuming that it's not a huge hassle to fit into the current psymodel (and I can't see why it would), then it should be pretty simple to implement and release.
CiTay
As promised, here is the answer that i got from Frank Klemm today. I translated it from german.


CODE
The calculated masked threshold is indeed depending on the level. It changes if lower levels
are approached. This modification was made sometime between encoder version 1.06 and 1.1.

With high levels, the NMR (noise-to-mask-ratio) was raised by 0.5 dB, with low levels,
it was lowered. The masked threshold (ATH) was lowered by 6 dB in total.

The original behavior was that, up to a certain threshold, things were coded with full NMR,
and after that it would suddenly get muted. A signal around that switching threshold
produced audible artifacts, despite the fact that many bits were used for coding.

The current behavior is that the coding gradually gets worse with very low levels and there's
almost no usable signal in the end. Only when this point is reached, the coding is stopped.

When you're looking at the error signal over the signal strength, there's a slowly declining
function that approaches the ATH from above. The old behavior first caused the error signal to
fall ca. 20 dB below the ATH and only raise to ATH-level again when the coding was stopped.

Extensive listening tests with headphones were conducted (headphones because of the high
listening level). For listening material, among others, the Bolero by M. Ravel was used.
Volume was adjusted to ca. 114 dB SPL at -0 dB signal strength.

At this volume, noise in the recording and quantization artefacts are already an issue with
many 16 bit recordings. As long as this level is not (clearly) exceeded, the quality of the
coding was clearly better, despite the lower bitrate (even though the NMR was raised by 0.5 dB
and the ATH lowered by 6 dB, there are spare bits with almost every kind of music).
The fluctuation in the coding - which was caused by activation and deactivation of subbands -
disappears.

But if you turn up the volume clearly above this level (ca. from 120 dB SPL at -0 dB signal
strength on), you hear the coding errors which are then pretty different from the older versions.

Now, if you disregard the question "what good are replaygains above +10 dB?" (with classical
music, only album-based replaygain should be used anyway), the problem can be solved by
lowering the ATH. It will result in a slightly higher bitrate.

If this problem is relevant for daily use in any kind of way, i dare say "no".
For most pop titles, you can increase the ATH by 30 dB and still not notice anything.
Even with classical music, 10 dB are often possible.

A clean solution is not possible with a 1-pass-coder; you would first need a rough
volume estimation of the whole song to estimate the maximum position of the volume knob -
and even then, you could still re-adjust during the title.

Furthermore, i would recommend corrections within Replaygain. A "quick-to-hack" solution
would be that the title-based replaygain of neighboring tracks in an album must not
differentiate by more than 6 dB.

From these (calculated) values:

- 7,81 dB
- 6,41 dB
- 7,61 dB
+4,81 dB
- 8,11 dB
- 6,12 dB
+1,12 dB
- 9,12 dB

you will then get:

- 7,81 dB
- 6,41 dB
- 7,61 dB
- 2,11 dB        // raised to -8,11 + 6
- 8,11 dB
- 6,12 dB
- 3,12 dB        // raised to -9,12 + 6
- 9,12 dB


Then, short voice tracks/interludes/preludes etc. don't get boosted to +40 dB anymore.
Because this is currently the only limit: Replaygain values of more than +40 dB are
simply reduced to 0 dB (not really that clean either). This limit should also be
reduced to +12 dB (corresponds to K-26).

If this proposal is taken up, i could send some reasonably tuned example code.
Somewhere in the depths of my hard disk there should be something.
In that code, the increase of these "holes" is also depending on the Album-replaygain,
the title length and sometimes from more distant neighboring tracks.
A "1 second digital null" before the first title approximately gets the value of the
first track, a "2 second digital null" in between two tracks gets the mean value
of both tracks.



static const Profile_Setting_t  Profiles [16] = {
   { 0 },
   { 0 },
   { 0 },
   { 0 },
   { 0 },
/*    Short   MinVal  EarModel  Ltq_                min   Ltq_  Band-  tmpMask  CVD_  varLtq    MS   Comb   NS_        Trans */
/*    Thr     Choice  Flag      offset  TMN   NMT   SMR   max   Width  _used    used         channel Penal used  PNS    Det  */
   { 1.e9f,  1,      300,       30,    3.0, -1.0,    0,  106,   4820,   1,      1,    1.,      3,     24,  6,   1.09f, 200 },  // 0: pre-Telephone
   { 1.e9f,  1,      300,       24,    6.0,  0.5,    0,  100,   7570,   1,      1,    1.,      3,     20,  6,   0.77f, 180 },  // 1: pre-Telephone
   { 1.e9f,  1,      400,       18,    9.0,  2.0,    0,   94,  10300,   1,      1,    1.,      4,     18,  6,   0.55f, 160 },  // 2: Telephone
   { 50.0f,  2,      430,       12,   12.0,  3.5,    0,   88,  13090,   1,      1,    1.,      5,     15,  6,   0.39f, 140 },  // 3: Thumb
   { 15.0f,  2,      440,        6,   15.0,  5.0,    0,   82,  15800,   1,      1,    1.,      6,     10,  6,   0.27f, 120 },  // 4: Radio
   {  5.0f,  2,      550,        0,   18.0,  6.5,    1,   76,  19980,   1,      2,    1.,     11,      9,  6,   0.00f, 100 },  // 5: Standard
   {  4.0f,  2,      560,       -6,   21.0,  8.0,    2,   70,  22000,   1,      2,    1.,     12,      7,  6,   0.00f,  80 },  // 6: Xtreme
   {  3.0f,  2,      570,      -12,   24.0,  9.5,    3,   64,  24000,   1,      2,    2.,     13,      5,  6,   0.00f,  60 },  // 7: Insane
   {  2.8f,  2,      580,      -18,   27.0, 11.0,    4,   58,  26000,   1,      2,    4.,     13,      4,  6,   0.00f,  40 },  // 8: BrainDead
   {  2.6f,  2,      590,      -24,   30.0, 12.5,    5,   52,  28000,   1,      2,    8.,     13,      4,  6,   0.00f,  20 },  // 9: post-BrainDead
   {  2.4f,  2,      599,      -30,   33.0, 14.0,    6,   46,  30000,   1,      2,   16.,     15,      2,  6,   0.00f,  10 },  //10: post-BrainDead
};


The Ltq_offset entry is the alteration of the masked threshold against the standard model.
A reduction by 6 dB decreases the ATH by 6 dB in the whole frequency range.

The value left of that (EarModel) can be used for ATH fine-tuning for higher frequencies.
An increasing by 20 results in a ATH decrease by 1.5 dB at 10 KHz and 6 dB at 20 KHz.

--quality 6 against --quality 5 has the following differences in the ATH with this:

- 6,0 dB for low frequencies
- 6,5 dB for 8 kHz
- 7,0 dB for 11 kHz
- 8,0 dB for 16,3 kHz
- 9,0 dB for 20 kHz
-10,0 dB for 23 kHz

If there are further questions or if something was unintelligible, just keep asking.
I still have no time, but when i have 15 minutes silence, i can answer such things.

Motto of the day: The ingeniousness of a construction lies within its simplicity.
Everyone can build something complicated. (Sergeij P. Koroljow)
CiTay
I'm a bit surprised nobody has to say anything to say to this, especially guruboolez?

Anyway, to summarize Frank Klemm's comments in a more simple manner:

- In a version between 1.06 and 1.1, the coding of low level (not low frequency!) signals was changed, to avoid artifacts that were caused when such a signal approached a certain lower threshold which made it fluctuate between "encode" and "not encode"

- The new method avoids that fluctuation by gradually decreasing quality towards the lower threshold, leading to a gentle deterioration and no audible artifacts even with quite "silent" music under normal circumstances, which was checked in listening tests

- Ridiculously high Replaygain values however (usually in track gain) can make artifacts with the new method audible again

- Replaygain in it's current state has some shortcomings for very dynamic albums

- The new method could be tuned by lowering the ATH (absolute threshold of hearing); basically making the "simulated hearing" a bit more sensitive

- For daily use and normal listening conditions, this problem is not relevant

- Possible solutions include the tweaking of the ATH curves and modifications to Replaygain
rjamorim
QUOTE (CiTay @ Jul 2 2005, 05:40 PM)
I'm a bit surprised nobody has to say anything to say to this, especially guruboolez?
*


Well, Guruboolez already said he's not planning to test Musepack again after the terrible behaviour displayed by the project's maintainer in face of useful and valid test results. So I suspect it makes no difference to him anymore what Klemm says.
CiTay
QUOTE (rjamorim @ Jul 2 2005, 10:56 PM)
Well, Guruboolez already said he's not planning to test Musepack again after the terrible behaviour displayed by the project's maintainer in face of useful and valid test results. So I suspect it makes no difference to him anymore what Klemm says.
*


Why don't you let him speak for himself? Some days ago he showed that he still is interested in a fix for this issue. No need trying to verbally divide things further than they already are.
Vertigo
Hahaha, I love it when robert comes in to save the day for musepack. =D
rjamorim
QUOTE (Vertigo @ Jul 2 2005, 08:54 PM)
Hahaha, I love it when robert comes in to save the day for musepack. =D
*


hehe. I actually have been, from the beginning, defending my good friend Guruboolez from bullshit coming form all sides. smile.gif
Dibrom
Do we need to split this thread again to stay on topic?
Cyaneyes
Just to comment on Frank's thoughts on Track gain...

Doesn't his proposal defeat the purpose of Track gain? If you want to keep the volume differences between tracks intact, you use album gain. He appears to be proposing a kind of half trackgain, half albumgain system.

What if you're playing tracks at random and come across a track that's not raised to the same gain as the others? In this context, having a rule that says neighboring album tracks must not vary by more than x number of db makes no sense.

If you wish to keep volume differences in tracks intact, but need to decrease dynamic range (because of a noisy listening environment, etc) this can be accomplished through DSP.
Andavari
QUOTE (Cyaneyes @ Jul 2 2005, 07:43 PM)
Just to comment on Frank's thoughts on Track gain...
*

I'd think that an additional switch could be added into replaygain.exe and mppenc.exe such as "--cvl" to tell them both they're dealing with classical, low volume material, etc. I'd also think that it would be the safest way to approach the problem since it would only be used for specific material that required it, without effecting material that doesn't.
Lyx
*nevermind - i mixed up trackgain and albumgain*
xmixahlx
...if this problem only occurs in music with ridiculous replaygain values (+40 dB for example)

perhaps wavegain is the answer?

*/me dodges flying debris*


later
guruboolez
QUOTE (CiTay @ Jul 2 2005, 09:40 PM)
I'm a bit surprised nobody has to say anything to say to this, especially guruboolez?


What am I supposed to add? Sorry, but I don't see what to do or to say. I made a test, and now it's to developers to work on the problem. A new encoder was released since, but the only change affects encoding speed, not quality. Only thing I can do is a bench...
Anyway, Roberto at least have understand what I've said previously. If some musepack's developer consider a listening test as an attack, I won't insist. It's pretty simple. The problem I've submit in this topic is just a "farewell present" to musepack. I know this problem for a long time, and the only thing I could do is to submit it. Just hope that it will help MPC to progress. Members should also be aware of this issue.


QUOTE
Anyway, to summarize Frank Klemm's comments in a more simple manner:
- In a version between 1.06 and 1.1, the coding of low level (not low frequency!) signals was changed, to avoid artifacts that were caused when such a signal approached a certain lower threshold which made it fluctuate between "encode" and "not encode"

I've posted two listening tests that conclude on it. Frank Klemm gives an explanation. What should I do more? I can't work on the encoder and tweak it.

QUOTE
- Ridiculously high Replaygain values however (usually in track gain) can make artifacts with the new method audible again

Not again. 1.01j and even 1.78c suffers from the same big issue.

QUOTE
- Replaygain in it's current state has some shortcomings for very dynamic albums

Not exactly. ReplayGain is doing its job, and perfectly. The problem is Musepack + Replaygain. Many other audio formats don't have this problem at this bitrate, an can be used with RG even with very high gain correction: vorbis, aac, mp3, DualStram and WavPack lossy. MPC VBR model is flawed, at least with the current tuning. ReplayGain is clean. There's only one way to correct the problem: working on MPC. Changing ReplayGain to work in a weird Track/Album mix have no sense at all. And saying that Track Gain is not suitable for classical or doesn't correspond to a a "normal listening condition" is just an easy way to hide the real problem, or to minimize it. A more serious alternative would be to take example on other formats: they work fine with RG in either track or album mode, with either classical or metal music. LAME, at least with CBR/ABR, had similar issues, and were solved by working on the code without changing RG behaviour.
MPC SV7 had in the past serious clipping issue; they were corrected by Klemm with the introduction of --xlevel tool and not by convincing the audio engineers to stop their loudness war.
Honestly, defending MPC by discrediting specific listening conditions or minimizing the problem won't make MPC sound better.
ath_gain swith is a working solution (at least to this problem), but to the price of efficiency.


QUOTE
- For daily use and normal listening conditions, this problem is not relevant

Are you suggesting that ReplayGain Track mode is not refecting "daily use" or "normal listening conditions"? blink.gif
markanini
QUOTE (guruboolez @ Jul 5 2005, 01:33 PM)
QUOTE
- For daily use and normal listening conditions, this problem is not relevant

Are you suggesting that ReplayGain Track mode is not refecting "daily use" or "normal listening conditions"? blink.gif
*


I don't think many use track gain for classical music.
Lime
I think a workaround is easy. Just do a replaygain at the same time as encoding, and if the RG value is over +20db for example then recompress the track with the --ltq switch. You can further tweak that using a certain --ltq value for lets say tracks with +15db to +20db, and another for tracks over +20db.

Musepack's encoding is very fast so there is room to lose some speed, and also, the RG calculation doesnt need to be perfect, just a rough estimate of the value would be enough.
Raptus
QUOTE (Gabriel @ Jun 23 2005, 11:33 AM)
The fix for the encoder would be to adjust dynamically its ath level.

Doing so on a frame to frame basis could even save some bitrate, considering that ath could be raised for louder parts. I don't know if this would unbalance the current psy-model, though...
Shade[ST]
wouldnt this type of adjustment make the ath useless alltogether? I'm not sure how the system works, but analysing on a frame-by-frame basis seems ridiculous to me...
Gabriel
QUOTE
wouldnt this type of adjustment make the ath useless alltogether? I'm not sure how the system works, but analysing on a frame-by-frame basis seems ridiculous to me...

Then consider it to be the effective threshold of hearing instead of the absolute one. The ETH is affected by the middle ear behavior.

If I remember well, Frank also thought that dynamically adjusting the ATH was quite "strange". However it appears that this scheme works in some encoders, and normal humans have a middle ear doing adaptative amplification/reduction of loudness.
guruboolez
QUOTE (markanini @ Jul 5 2005, 02:45 PM)
I don't think many use track gain for classical music.
*

I'm using track gain every day on my portable player. Quiet tracks are simply inaudible outdoor. MPC have poor hardware support, but with RockBox, some jukebox will probably support it. Then the problem will start to annoy some classical fans.
BTW, problem is also sometimes audible with +10 dB; I've some discs with Track adjustment at this level, and issues (minor but real) become audible at house, with Album Gain mode. The problem remains...
Dibrom
QUOTE (guruboolez @ Jul 5 2005, 04:33 AM)
QUOTE (CiTay @ Jul 2 2005, 09:40 PM)
I'm a bit surprised nobody has to say anything to say to this, especially guruboolez?

Anyway, Roberto at least have understand what I've said previously. If some musepack's developer consider a listening test as an attack, I won't insist.


I think it's probably worth noting that a single disagreement is not necessarily representative of everyone involved, or everyone capable of being involved. It's been said before, but maybe it needs to be said again...

QUOTE
QUOTE
- Replaygain in it's current state has some shortcomings for very dynamic albums


Not exactly. ReplayGain is doing its job, and perfectly.


Well I'd agree with the "replaygain is doing it's job" part, but the issue is that the way in which it is doing its job is not necessarily ideal for a psychoacoustic encoder in certain scenarios. That is how I understand Frank's explanation.

QUOTE
The problem is Musepack + Replaygain. Many other audio formats don't have this problem at this bitrate, an can be used with RG even with very high gain correction: vorbis, aac, mp3, DualStram and WavPack lossy. MPC VBR model is flawed, at least with the current tuning.


I'm not sure you can really conclude that the MPC VBR model is flawed from this. If the issue is simply that MPC is cutting things closer to the threshold than most other encoders, given expected listening conditions, then I don't see what the problem is (with the model I mean).

Under unexpected listening conditions, problems happen. This, in my opinion, is similar to the situation when encoding content to be played back with surround sound processing later. Some codecs don't perform well here, and IMO, shouldn't necessarily be expected to. Of course some do perform well here, but then the question could be asked as to whether they are sacrificing efficiency in other cases simply to deal with an unlikely listening scenario...

In either case, the fix is usually simple. Encode with different settings if you plan to listen under conditions which you know the encoder is not tuned for by default. In the case of MPC, that's playing with the ath at encode time, with other encoders and surround sound processing, sometimes that's playing with the js settings.

QUOTE
ReplayGain is clean. There's only one way to correct the problem: working on MPC.


That's one way to correct the problem, but I don't think it's the only one given what has been said.

QUOTE
Changing ReplayGain to work in a weird Track/Album mix have no sense at all. And saying that Track Gain is not suitable for classical or doesn't correspond to a a "normal listening condition" is just an easy way to hide the real problem, or to minimize it.


I don't think Frank was saying Track Gain is not a normal listening condition. But using Track Gain when listening to certain highly dynamic classical music and using the standard MPC preset tunings is "not a normal listening condition." I think it's important to define "normal listening condition" here. I don't think Frank means that perhaps other people are not listening this way, but that from the standpoint of the encoder and the way in which the psymodel was designed (and indeed what could be expected from average conditions under which the model must perform) -- in that case it is not a "normal listening condition."

QUOTE
A more serious alternative would be to take example on other formats: they work fine with RG in either track or album mode, with either classical or metal music. LAME, at least with CBR/ABR, had similar issues, and were solved by working on the code without changing RG behaviour.


Well I'm a bit curious.. I don't follow LAME development much these days, but did the "fixes" in LAME for this end up resulting in higher bitrates across the board?

QUOTE
MPC SV7 had in the past serious clipping issue; they were corrected by Klemm with the introduction of --xlevel tool and not by convincing the audio engineers to stop their loudness war.


I think that this was clearly a different situation. MPC had clipping issues because of a technical design flaw from early on (as I understand it), not because of an estimation about reasonable conditions under which a signal can expected to be masked according to likely playback volume.

QUOTE
Honestly, defending MPC by discrediting specific listening conditions or minimizing the problem won't make MPC sound better.
ath_gain swith is a working solution (at least to this problem), but to the price of efficiency.


Yes, that is true. But I don't think Frank was discrediting. I think he was simply explaining. And I must say that his explanation seems to make sense to me. I also realize that from an end user point of view it is annoying and perhaps frustrating to have to have "special conditions" when encoding this type of music with MPC versus other formats. But from a design and coding point of view, it would also seem to me to be frustrating to have to modify the psymodel to deal with something which is unusual for the given reasons and could possibly reduce efficiency across the board (if I'm wrong about that, then nevermind). By that, I mean that changing this in the actual psymodel would probably result in efficiency loss similar to changing the ath_gain switch.

Personally, I don't listen to much classical, but I do listen to a lot of music with high dynamic range. I never use Track gain. But given what has been said, I don't mind adjusting the ath when encoding this sort of material. Of course a more automatic solution would be desirable, and maybe this could be implemented (in the frontend). But if the required solution is to modify the ath for the presets to deal with this one particular case, I'm not sure if that's a very good fix either...

Of course, maybe there are other possibilities or maybe I'm just missing something...
Dibrom
QUOTE (Gabriel @ Jul 5 2005, 07:04 AM)
If I remember well, Frank also thought that dynamically adjusting the ATH was quite "strange". However it appears that this scheme works in some encoders, and normal humans have a middle ear doing adaptative amplification/reduction of loudness.
*


If the adjustments are made according to a psychophysical phenomenon, that's one thing, and it seems to me to be desirable to have that included in the way the psymodel works.

But attempting to adjust ATH according to how the user might play with their volume control (e.g., Replaygain Track mode on highly dynamic classical music) is another variable entirely, and IMO not necessarily one that the psymodel should even attempt to tackle. Of course that should be up to the encoder designer and perhaps the expected userbase, but from a technical and conceptual point of view, it seems unrelated to the psymodel itself.

If you plan to factor in all of these sorts of cases, then IMO you also need to factor in many different types of possible postprocessing simply to stay consistent. But this is a poor design choice I think and results in a much less conceptually clean psymodel -- it is not concerned anymore simply with how the user hears things, but also about how they play them back. Since this is a whole lot more difficult (eventually impossible?) to predict, it seems to make sense to me to push this sort of prediction work off onto the client (the user) -- i.e., have them modify encode time settings to deal with the sort of situation under which they will be listening if it happens to deviate significantly from an expected (and simple) set of baseline assumptions the encoder makes.
guruboolez
QUOTE
I think it's probably worth noting that a single disagreement is not necessarily representative of everyone involved, or everyone capable of being involved.  It's been said before, but maybe it needs to be said again...

Right. But understand that I won't risk to give feedback to a development team which include (one?) aggressive member, and who don't care about ABX (called "flawed" or something like that) methodology. I'm testing for nothing, and I don't request anything else as polite answers.

QUOTE
I'm not sure you can really conclude that the MPC VBR model is flawed from this.  If the issue is simply that MPC is cutting things closer to the threshold than most other encoders, given expected listening conditions, then I don't see what the problem is (with the model I mean).


Vorbis was affected by the same issue at low bitrate, and you exactly called it a "flaw" ("No, this doesn't represent an "argument against VBR", it simply emphasizes a flaw in the Vorbis encoder").
Source: http://www.hydrogenaudio.org/forums/index....indpost&p=71576
wink.gif

QUOTE
Of course some do perform well here, but then the question could be asked as to whether they are sacrificing efficiency in other cases simply to deal with an unlikely listening scenario...

Possibly. But call it too fast "an unlikely scenario". For ReplayGain, MPC is probably the most advanced lossy format (adjustment are stored in metadata or header, impressive Winamp plugin, support directly in mppdec).
This scenario is very usual, at least when people are listening to classical stuff.

QUOTE
In either case, the fix is usually simple.  Encode with different settings if you plan to listen under conditions which you know the encoder is not tuned for by default.

I don't consider it as a valid solution. Tweaking an encoder to fit to a specific solution is very unconfortable. Last and not least, it's against 4 years of HA's recommendation (use --preset and nothing else).


QUOTE
I don't think Frank was saying Track Gain is not a normal listening condition.  But using Track Gain when listening to certain highly dynamic classical music and using the standard MPC preset tunings is "not a normal listening condition."


On nomad conditions, it is. But again, the problem is also audible with Track Gain Mode with some albums. And only with mpc... Minimizing the problem won't solve it.

QUOTE
Well I'm a bit curious.. I don't follow LAME development much these days, but did the "fixes" in LAME for this end up resulting in higher bitrates across the board?

I've noticed it with ~128 kbps (ABR and CBR). Bitrate is therefore the same. I could upload samples for which 3.90.3 has poorer performance than... Blade after ReplayGaining it. 3.97 is near perfection.

QUOTE
I think that this was clearly a different situation.  MPC had clipping issues because of a technical design flaw from early on (as I understand it), not because of an estimation about reasonable conditions under which a signal can expected to be masked according to likely playback volume.

I never encountered a clipping issue with my classical library (>1000 discs). For me, this problem of loudness clipping is as unusual than the situation I've described could appear to people listening to metal or something else.
In both case, we have a technical issue. Clipping was solved by development, and clipping should be solved by the same way.

People listening to different musical genres may encounter different issues. What MPC developers are trying to do is to minimize a problem which is more common as you expect. I don't want to start a idiot flame war. What I'm trying to say, is that MPC has problems, probably not audible to people listening to something else than classical. That's a pity, because I've tested one year ago MPC on classical, and it performed very well.


QUOTE
By that, I mean that changing this in the actual psymodel would probably result in efficiency loss similar to changing the ath_gain switch.


But why? Other format don't have this problem. And honestly, we can't say anymore that Vorbis, LAME or AAC are inefficient or are wasting bit.
Dibrom
QUOTE (guruboolez @ Jul 5 2005, 07:51 AM)
QUOTE
I think it's probably worth noting that a single disagreement is not necessarily representative of everyone involved, or everyone capable of being involved.  It's been said before, but maybe it needs to be said again...

Right. But understand that I won't risk to give feedback to a development team which include (one?) aggressive member, and who don't care about ABX (called "flawed" or something like that) methodology. I'm testing for nothing, and I don't request anything else as polite answers.


Well haven't you gotten polite answers in general? You had one argument, but other people are reading and listening, and are interested in hearing more.

I would just forget about it and move on. Yes, easier said than done, but it'd cause less strife. Why not we just let the issue die? There are many people here interested in a civilized discussion, and will read your posts. You don't have to participate anymore if you don't want to, but I wouldn't stop because of a single case. Up to you.

QUOTE
QUOTE
I'm not sure you can really conclude that the MPC VBR model is flawed from this.  If the issue is simply that MPC is cutting things closer to the threshold than most other encoders, given expected listening conditions, then I don't see what the problem is (with the model I mean).


Vorbis was affected by the same issue at low bitrate, and you exactly called it a "flaw" ("No, this doesn't represent an "argument against VBR", it simply emphasizes a flaw in the Vorbis encoder").
Source: http://www.hydrogenaudio.org/forums/index....indpost&p=71576
wink.gif


Hey, I can change my mind, right? smile.gif

But seriously, I don't remember that discussion and am too busy to go reread it all. My opinion on certain things has definitely changed though, and I suppose that post was written quite awhile ago.

I'm not so interested in one solution having to deal with all cases, at least at a given level of abstraction. I think it would be a perfectly fine and desirable solution for this to be handled without any user intervention, but I would probably "fix" it as a two-pass encoding scheme or something else rather than modifying the psymodel.

The problem here is where to make conceptual separation between encoder design to deal with psychoacoustic phenomena and design to deal with user playback schemes. Before I didn't care much about the separation, now I do.

QUOTE
QUOTE
In either case, the fix is usually simple.  Encode with different settings if you plan to listen under conditions which you know the encoder is not tuned for by default.

I don't consider it as a valid solution. Tweaking an encoder to fit to a specific solution is very unconfortable. Last and not least, it's against 4 years of HA's recommendation (use --preset and nothing else).


The radical emphasis on exclusive preset use in LAME originally had to do with the fact that LAME had many exposed experimental switches mixed in with regular switches. There was never a very good conceptual separation between the two, and many were undocumented or poorly documented.

There were also many, many myths about how the encoder performed with certain switch combinations. In my opinion at the time (and probably still now, on that specific point, since the frontend is still a mess) it was best to emphasize a single switch so as to completely do away with the rest of the mess that is the frontend.

If the frontend had been redesigned so that no harmful switches were exposed any longer, along with some sort of way to disallow clearly harmful switch combinations, then a single switch would not necessarily have been needed.

Since that never happened, the best solution was to only use one switch, otherwise people begin to be encouraged to use the preset but modify a little bit here, a little bit there, and pretty soon the whole point is lost...

MPC doesn't have this problem nearly as much, so it's not as big of a deal IMO to have some sort of extra switches used for certain cases.

QUOTE
QUOTE
I don't think Frank was saying Track Gain is not a normal listening condition.  But using Track Gain when listening to certain highly dynamic classical music and using the standard MPC preset tunings is "not a normal listening condition."


On nomad conditions, it is. But again, the problem is also audible with Track Gain Mode with some albums. And only with mpc... Minimizing the problem won't solve it.


So relative to listening to track mode replaygained classical music on nomad, the problem is common. How common is that, absolutely? Enough to force a design change in the psymodel? Maybe, I'm not sure...

Replaygain track mode performance in general is another situation, but again this is like some sort of post-processing. How can the encoder be expected to predict this without given that information beforehand? Sure, you can make a guess about it, but this is sort of a hack (i.e., not an elegant solution from a design perspective), and only for a single case.

QUOTE
QUOTE
Well I'm a bit curious.. I don't follow LAME development much these days, but did the "fixes" in LAME for this end up resulting in higher bitrates across the board?

I've noticed it with ~128 kbps (ABR and CBR). Bitrate is therefore the same. I could upload samples for which 3.90.3 has poorer performance than... Blade after ReplayGaining it. 3.97 is near perfection.


Well that is indeed interesting. Given a fixed bitrate, I wonder how things were reshuffled to deal with the bitrate increase for frames that were given higher quality after making the modification. I wonder if quality decreased elsewhere? With ABR this would seem to have to be the case, because if the bitrate increased in quieter frames, the encoder would decrease bitrate elsewhere to hit the target. This might result in increased quality across the board, but is a compromise in the technical sense if quality is decreased elsewhere even if not noticed in most cases. With CBR it might be slightly different depending on how the bit reservoir used. With pure VBR though, I would definitely expect the bitrate to simply increase.
Vertigo
QUOTE
QUOTE
I think it's probably worth noting that a single disagreement is not necessarily representative of everyone involved, or everyone capable of being involved.  It's been said before, but maybe it needs to be said again...

Right. But understand that I won't risk to give feedback to a development team which include (one?) aggressive member, and who don't care about ABX (called "flawed" or something like that) methodology. I'm testing for nothing, and I don't request anything else as polite answers.

Regardless of your perspective, you are helping the development team by providing data. Not only that, you are being a valuable member of the community by giving us insight on the nature of the codec. You listen to music that not all people do, and thus provide essential tuning information. You are well respected and valued. I would suggest both you and the developer, moreso him, not be childish in the least and agree there is a problem and work on the data you've collected.

QUOTE
QUOTE
In either case, the fix is usually simple.  Encode with different settings if you plan to listen under conditions which you know the encoder is not tuned for by default.

I don't consider it as a valid solution. Tweaking an encoder to fit to a specific solution is very unconfortable. Last and not least, it's against 4 years of HA's recommendation (use --preset and nothing else).


This suggestion, I think, is outdated and unfounded. In a perfect world, this would be ideal, but we do not live in a perfect world. SV7 is still beta, and switches can be used to fix issues that may occur. To blindly encode without understanding the content you are working with is foolish. Once detection methods are perfect, then quality presets will not have switches. But I will say, HA has a wonderful pipedream tradition.
Dibrom
QUOTE
QUOTE
By that, I mean that changing this in the actual psymodel would probably result in efficiency loss similar to changing the ath_gain switch.

But why? Other format don't have this problem.


Why? If you make the assumption that MPC is cutting things closer to the threshold, that means that its bit allocation is going to be lower than an equivalent bit allocation in another encoder. This is because the other encoder is already spending those extra bits that the MPC encoder is not. If you increase the encoding quality on lower volume parts, then you end up spending those extra bits on MPC that were not spent before, meaning the bitrate increases.

QUOTE
And honestly, we can't say anymore that Vorbis, LAME or AAC are inefficient or are wasting bit.


Well from an average human listener perspective we can't really know this, which is why we have a psymodel judge it for us. We would only be able to judge this indirectly through examining the workings of the psymodel, and the average human listener is unable to do this.

One way to get an indication though is to look at Vorbis performance in this situation before the changes were made and after the changes were made that "fixed" the problem. Did the bitrate increase? (I don't remember)
Vertigo
I think we need to send Guruboolez the HA Controversy Award, he's certainly earned it over the years.
rjamorim
QUOTE (Vertigo @ Jul 5 2005, 01:32 PM)
I think we need to send Guruboolez the HA Controversy Award, he's certainly earned it over the years.
*


More than me? I'm desolated! :-B
Vertigo
QUOTE (rjamorim @ Jul 5 2005, 08:51 AM)
QUOTE (Vertigo @ Jul 5 2005, 01:32 PM)
I think we need to send Guruboolez the HA Controversy Award, he's certainly earned it over the years.
*


More than me? I'm desolated! :-B
*



No, your HA Shit-stirrer award is in the mail.
guruboolez
QUOTE
Well haven't you gotten polite answers in general?  You had one argument, but other people are reading and listening, and are interested in hearing more.



In general, yes. Fortunately tongue.gif

QUOTE
You don't have to participate anymore if you don't want to, but I wouldn't stop because of a single case.  Up to you.

I'll think about it, but currently, I confess that I'm not really in mood to follow with MPC.

QUOTE
Hey, I can change my mind, right? smile.gif

re- wink.gif
QUOTE
The radical emphasis on exclusive preset use in LAME originally had to do with the fact that LAME had many exposed experimental switches mixed in with regular switches.  There was never a very good conceptual separation between the two, and many were undocumented or poorly documented.

OK for lame. But MPC is in the same situation. How was called the adaptive behaviour introduced by Klemm in mppenc with 0.90s or so, which lead to lower the name of the profile when some switchs were add to the simple preset?idiot-proof if I remember. The use of personal switch or command line was ~always discouradged, even non-experimental one. Vorbis discouraged the choice of unusual command line by a very long command name. Etc...
Note that I don't think that people shouldn't use personnal command (far from it), but I've just recall that it was and still is not recommended here.

A better solution would be a small and intuitive command, performing like --ms 15 (stereo image) or --itp for vorbis (sharpness), for people interesting to keep details at low volume (they have to assume their choice, and possible bitrate bloat). --ath_gain could be used as it. Why not updating the current recommendation, and communicating about the pertinent use of these command line? --ms 15 for surround; --ath_gain xx for security with classical?

It's not a very convenient solution for the user, but it's a working one. After all, the recommendation are here to guide the user.

QUOTE
I wonder if quality decreased elsewhere?


I've tested LAME on many samples, and the current encoder perform better with most of them. There's just one specific issue with lame 3.97a10 (warbling), but I'm not sure that it's link to the gain in quality that could be noticed on quiet parts. But there are maybe problems I haven't noticed. I'll maybe make a more extensive tests between 3.90.3 and 3.97 once this last one will reach final or beta status.

QUOTE
One way to get an indication though is to look at Vorbis performance in this situation before the changes were made and after the changes were made that "fixed" the problem. Did the bitrate increase? (I don't remember)
Yes, and no. The bitrate increases a lot between 1.00 and 1.01 (the fixed encoder) on the affected material (at -q0, bitrate was ~30kbps and then jumped to a more conventional ~60 kbps), but didn't change with louder material. Same for MPCq5 with and without --ath_gain 14: 110...130 -> 170...180 kbps IIRC. With classical stuff (full tracks), it leads to a +10 kbps inflation (approximation). I can't say for other musical genre.
Jebus
I think there seems to just be an issue with ATH and replaygain, period. If you hard-set the value (using --scale in lame.... sorry guys i'm not familiar with MPC at all), then the codec should be able to adjust the ATH curve based on the replaygain values used. This is why i brought up the whole --scale thing in the first place.

I seriously think adjusting gain after the codec had already used an ATH based on the original levels, is always going to have the potential for problems. The best work-around would be to add a --safe-quiet switch or something so that it uses a stricter ATH. But that's going to increase bitrate, of course.
CiTay
I got a new e-Mail from Frank (he follows this thread):

QUOTE
Title-based Replaygain in it's current form is completely useless for classical music. I could implement a couple of small, sensible modifications.

The whole thing has nothing to do with "merging" album- and track gain. Album Gain's current concept is correct, Track Gain should actually have a whole different concept.

First of all there's the question: Are modifications of loudness by more than +12 dB (for silent material) and -12 dB (for loud material) even reasonable, and are these even intended volume differences? This is also put into question if tracks are relatively short (i.e. clearly shorter than neighboring ones) and e.g. contain only announcements. Here, an adaptive method leads to better results.

Also, there are severe problems if the tracks aren't strictly seperated or when the most silent spot doesn't lie at the title border/cut. Moreover, the results are completely different when the same material is a) available as a CD with two big tracks or b) available as a CD with many small tracks.

Here we need another (more complex) solution, especially for classical music.

For one thing, it has to be possible to (slowly) change the loudness within a track (because it's e.g. 44 minutes long), and also, it has to be possible to steadily change the loudness, in order to have gapless title gain with live concerts.

This then amounts to a dynamic range compression similar to AC3, which however means more than 16 bit in the header, because we need to have constant control data.

For Matroska, here would be a proposition:

* Determine and apply album-based Replaygain (control range between K+26 and K-6) [with current nomenclature: +12 to -20 dB]
* Title-based Replaygain is an additional control signal, which is additionally applied
* Restrict title-based Replaygain to +-12 dB
* Title-based Replaygain can be changed steadily within a track
* For that, a control signal similar to an automatic level adjustment is calculated, with the following differences:
  - slow adjustment rate with normal levels
  - level-dependent maximum adjustment rate (e.g. 6 dB/sec at <-80 dB, 1.5 dB/sec at -60 dB, 0.4 dB/sec at -40 dB, 0.1 dB/sec at -20 dB, 0.02 dB/sec at >0 dB)
          - these levels are relative to the album-based Replaygain
  - adjustment considers volume jumps in both time directions
 

Result:

* Normal pop music with digital zero between the tracks isn't handled much differently, maybe the dynamics are lowered by max. 1 dB/minute.
    - There are differences with long titles that have bigger dynamics
* The outcome doesn't depend on where the cuts/title borders are (important!)
    - in particular, there are no problems when they're at the wrong place or when it's a live concert without cuts
* With classical music, the outcome is much more closer to what you would want when you listen to Beethoven in the subway.

But this thing is something that would be more for Matroska than for the MPC format.
 
Concerning another topic in the discussion: Encoding switches can be changed without making the file lose its "profile" info. You just must not decrease the quality with the switches.
Gabriel
Isn't the purpose of track gain to be able to use the tracks in a compilation of tracks from different albums?
In this case, why would track gain be dependant of the gain of its neighbours tracks in the original album? If someone wants to listen a full album, he would use album gain, not track gain.
Gambit
It seems funny to me that you would try to fix Replaygain, when the problem is clearly on Musepack's side.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.