An idea of audio encode algorithm, based on maximum allowed volume of , WavPack hybrid mode test included |
![]() ![]() |
An idea of audio encode algorithm, based on maximum allowed volume of , WavPack hybrid mode test included |
Mar 25 2013, 03:10
Post
#51
|
|
|
Group: Members Posts: 48 Joined: 19-July 12 Member No.: 101579 |
Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs. If do it without dithering, the noise will be about -1.4 dB, and if use dithering, yes, it will be about -45 dB, but it all will be in high frequencies, so using equalizer will make it easily audible. The audio quality of this approach will already be poor. If you're also going to use EQ, you should absolutely apply it BEFORE you encode. Otherwise you will need to tolerate much lower quality or else even higher bitrates. Yes, but, I did not say, that -45 dB, given by this approach, and -45 dB, given by WavPack hybrid, are of the same quality. Definately they are not. I wrote only about WavPack, which I had tested. You seem to be looking for some form of holy grail of lossy audio encoding: great compression, zero artifacts, super simple algorithm. Many smart people have spent a lot of time and effort to give us good compression and few artifacts. But the algorithms involved usually aren't very simple. No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime. Of course, every encoder has its own properties, so the way of the evaluation of the result should consider this properties. QUOTE It depends on what to call "transparent". The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is since transparency specifically refers to specific combinations of listener and material, your pointing out how a codec that is usually transparent at much more sensible bitrates fails to be transparent at a very high bitrate with one particular sample does not support your argument: it’s actually undercutting it. There will always be exceptions to transparency, at least for certain people and certain signals, and none of your nice-sounding-in-novice-theory-but-baseless-in-practice ideas are likely to change that. At least develop a consistent narrative before you try to make everyone implement it at your behest.I do not use the word "transparent" at all. I prefer audible/inaudible instead. Yes, my approach is that the difference should be inaudible for all humans (not dogs, cats, snakes etc.). We are humans, so there are restrictions of our perceptibility. If you do not hear the difference, it does not mean, that it is not there, and if the difference is there, it does not mean, that you (any human) can hear it. Audio listening is an objective thing. Usually people do not hear the difference because they are not attentive, patent etc. enough. They actually can do it, but silent mind is needed first. In my opinion, for encoder there should not be any exceptions of input audio (when you try to substitute lossless). Otherwise, use Opus 208 kbps and be happy. It gives high quality for all types of music. Are you saying that lossyWAV standard without noise shaping is transparent? I can not say for sure. At 32 kHz sample it is audible, for 44 kHz and higher it is probably not, but deeper tests are needed. (with adaptive noise shaping 44 kHz is audible) QUOTE If I've understood you correctly, I think it's the closest thing you're going to get to your goal. Yes, as far, as I tested it, it can be safely used instead of lossless. |
|
|
|
Mar 25 2013, 16:12
Post
#52
|
|
![]() Group: Developer Posts: 2986 Joined: 2-December 07 Member No.: 49183 |
|
|
|
|
Mar 25 2013, 16:44
Post
#53
|
|
|
Group: Members Posts: 951 Joined: 6-September 04 Member No.: 16817 |
No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime. Of course, every encoder has its own properties, so the way of the evaluation of the result should consider this properties. Surely this could leave said program running indefinitely due to never matching the criteria? How do you define 'proper' for every possible type of audio? I do not use the word "transparent" at all. I prefer audible/inaudible instead. Yes, my approach is that the difference should be inaudible for all humans (not dogs, cats, snakes etc.). So you don't think that when someone says something is 'transparent' to them that they don't mean that all artifacts are 'inaudible' to them? I don't think you understand what transparency is. This post has been edited by probedb: Mar 25 2013, 16:45 |
|
|
|
Mar 25 2013, 18:22
Post
#54
|
|
![]() Group: Members Posts: 913 Joined: 15-December 01 From: Germany Member No.: 662 |
No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime. But how does the external program determine whether the result is "proper"? Simple approaches will not be both near-transparent and efficient. You need something more complex to achieve both simultaneously. Neither simple nor complex approaches can guarantee transparency, as long as you are not exactly reproducing the input signal. |
|
|
|
Mar 28 2013, 04:59
Post
#55
|
|
![]() WavPack Developer Group: Developer (Donating) Posts: 1219 Joined: 3-January 02 From: San Francisco CA Member No.: 900 |
I tried the old inversion trick on a drum & bass song and you can hear percussive elements; in my example you can clearly make out the snare hits. There seem to be no tonal elements. On an a-capella song you hear broadband noise with a similar amplitude response as the original. "s" sounds in the original produce short bursts of high pitched noise in the difference file. Sorry that I wasn't following this thread more closely and so missed this. What you are hearing is the dynamic noise shaping feature that measures high-frequencies in the source and tilts the spectral balance of the quantization noise (generated by the lossy mode) up or down in an attempt to have it more likely masked by the source audio. This gave a nice improvement to some samples where high-frequency transients would sometimes result in nasty bursts of low-frequency noise. It's very much like the adaptive noise shaping of LossyWAV, but simpler. On the broader question, there are several operations and parameters in the lossy mode that I have added over the years (with lots of help from people with much better hearing than mine) to improve the transparency of the lossy mode, and they're all based on psychoacoustic principles, but that doesn't make it a psychoacoustic codec, IMO, because it doesn't implement any hearing model and it has no VBR mode wherein the bitrate is altered according to some estimate of perceptual quality. In any event, it's not a purely mathematical operation like ADPCM either, so saying that it has a weak psychoacoustic model certainly would not bother me. |
|
|
|
Mar 28 2013, 07:20
Post
#56
|
|
![]() Group: Super Moderator Posts: 9265 Joined: 1-April 04 Member No.: 13167 |
Thanks for chiming-in, David!
-------------------- Everything sounds the same until it is proven otherwise.
|
|
|
|
Mar 28 2013, 10:34
Post
#57
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine. Kind of reductio ad infinitum: if we go along with this line of reasoning, we should say that what our hearing system transmits to our brain is a lossy version of the real soud event QUOTE and air molecules vibration is a sampled and quantized lossy reproduction of actual instrument's vibrations... I don't want to go that far. But, if a tree falls in the forest when nobody's there, does it make any noise? QUOTE As I see it, sound, or better, music production is all about psychoacoustic and the reference model is always our hearing system, so arguing about bandwidth and SNR limits of CD format (*) means willing to (re)produce something that not only nobody could realistically hear, but that wasn't even in composer's or player's or instrument builder's mind in the first place! My point was that, in a really esoteric discussion like this one, we have to be 100% clear what we mean by transparent, and what we mean by psychoacoustic model. It's a failure to understand these two things properly that lets the OP make some statements that many reading here will judge to be ridiculous.If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word. Cheers, David. |
|
|
|
Mar 28 2013, 13:41
Post
#58
|
|
|
Group: Super Moderator Posts: 4353 Joined: 23-June 06 Member No.: 32180 |
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word. Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.
|
|
|
|
Mar 28 2013, 17:30
Post
#59
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word. Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.HA is a fun place to make the argument about it not being actual music, and therefore unimportant. HA was born when mp3 couldn't encode certain artificial kinds of music very well at all. The inventors of mp3 probably didn't think it was actual music, and didn't know or care that strings of synthetic impulses (for example) were handled very badly by mp3. Cheers, David. P.S. apparently the ability to hear 23-24kHz, at levels of around 80-100dB SPL, is quite widespread in younger people (e.g. under 25). Normal hi-fi can't reproduce it, and I can't imagine why anyone would want to listen to it - but then, some people would never want to listen to undial.wav, or Aphex Twin, or Merzbow, or... |
|
|
|
Mar 28 2013, 17:38
Post
#60
|
|
|
Group: Super Moderator Posts: 4353 Joined: 23-June 06 Member No.: 32180 |
Ah, you see, I knew that would be the response, and fair enough. However, no where in the definition of transparency that I would expect to use does it say "actual music signals" - the idea is that, for some signal, at some bitrate, for some listener, using some equipment, in some usage scenario, the codec is transparent […] The concept of complete transparency (all signals/listeners/equipment/scenarios) is quite unattainable IMO - except for a mathematically lossless transformation. Yet the OP seems to want complete transparency, and thinks a computer programme is going to be able to judge when this is achieved. Good points. I was just defending CDDA as a musical medium since you obviously aren’t denying its suitability but it’s always possible that someone might take that quote wrongly. QUOTE HA is a fun place to make the argument about it not being actual music, and therefore unimportant. HA was born when mp3 couldn't encode certain artificial kinds of music very well at all. The inventors of mp3 probably didn't think it was actual music, and didn't know or care that strings of synthetic impulses (for example) were handled very badly by mp3. Heh, also a good point. Again, I’m definitely not denying the utility of synthetic signals in increasing the technical quality of an encoder, and possibly in a way that transmits to more commonly audible material. It just helps to avoid placing too much emphasis on pure tones when, as I said, the ability to hear them need not reflect the actual lowpasses someone can discern in real material comprising complex waveforms, and also, pure tones aren’t likely to give encoders much trouble in comparison to synthetically concocted complex tones.
|
|
|
|
Mar 28 2013, 17:45
Post
#61
|
|
![]() Group: Members (Donating) Posts: 1442 Joined: 11-February 03 From: Vermont Member No.: 4955 |
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word. Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.First, music is not by definition multiple tones at once. It could be a single line melody, or even just rhythm on a single note. Or a CD could have non musical audio. Second, depending on masking to make it transparent takes it into the realm of lossy, which was the original point of this sub-topic. Third, why limit the domain to people old enough to have presumably reduced hearing? |
|
|
|
Mar 28 2013, 17:50
Post
#62
|
|
|
Group: Super Moderator Posts: 4353 Joined: 23-June 06 Member No.: 32180 |
First, music is not by definition multiple tones at once. It could be a single line melody, or even just rhythm on a single note. Or a CD could have non musical audio. OK, then allow me to clarify what I hoped was clear but perhaps was badly worded: by multiple tones, I meant timbres more complex than single sinewaves.QUOTE Third, why limit the domain to people old enough to have presumably reduced hearing? I have no desire to do this.I was making some general and simplistic points about the ability to hear pure tones at a given frequency vs. that frequency’s relevance in common types of material containing multiple harmonics. I’m not trying to reshape how people develop codecs or claim that I know better. Developers are obviously free to test and process in whichever ways and on whichever types of material, ‘realistic’ or not, they choose. softrunner in particular might need some radical new methodologies to get this project off the ground… |
|
|
|
Mar 28 2013, 19:42
Post
#63
|
|
|
Xiph.org Speex developer Group: Developer Posts: 431 Joined: 21-August 02 Member No.: 3134 |
It's a failure to understand these two things properly that lets the OP make some statements that many reading here will judge to be ridiculous. No, what made the OP ridiculous is that it's proposing a "new" idea that not only ignores all advances that have been made in the past 40 years, but even shows misunderstanding of what was known 40-year ago. Hint: G.711 (mu-law/A-law) is 40 years old and even at that time it was known that the noise energy had to be modulated with the signal amplitude and that constant-level noise is a dumb idea. |
|
|
|
Mar 28 2013, 21:19
Post
#64
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
I think he implied a noise floor relative to peak level, or something. That would be NICAM.
Cheers, David. |
|
|
|
Mar 28 2013, 21:49
Post
#65
|
|
|
Xiph.org Speex developer Group: Developer Posts: 431 Joined: 21-August 02 Member No.: 3134 |
I think he implied a noise floor relative to peak level, or something. That would be NICAM. Well, the OP mentions working in chunks of 1 second, which would be pretty useless for setting a relative floor. So far less advanced than mu-law (1972) and NICAM (which according to Wikipedia is from 1964). I guess that makes the idea worse than 50 year old technology. But that's OK, I've got a much better idea involving wax and needles |
|
|
|
Mar 28 2013, 22:29
Post
#66
|
|
![]() Group: Members Posts: 379 Joined: 16-December 10 From: Palermo Member No.: 86562 |
First, music is not by definition multiple tones at once. It could be a single line melody, or even just rhythm on a single note. Or a CD could have non musical audio. OK, then allow me to clarify what I hoped was clear but perhaps was badly worded: by multiple tones, I meant timbres more complex than single sinewaves.I'm defintely with your idea of music, but to be sincere, after Cage's 4'33'' I'll not be that much surprised if someone will come out with, say, "6279000", a composition made of a single 23kHz tone. And maybe someone else will rush to buy it on HD format... -------------------- ... I live by long distance.
|
|
|
|
Mar 29 2013, 12:16
Post
#67
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
I've got a much better idea involving wax and needles Oh, get with the times http://www.youtube.com/watch?v=ik8sJds4hV8 Cheers, David. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 24th May 2013 - 16:50 |