Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: An idea of audio encode algorithm, based on maximum allowed volume of  (Read 36230 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #50
Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs.

If do it without dithering, the noise will be about -1.4 dB, and if use dithering, yes, it will be about -45 dB, but it all will be in high frequencies, so using equalizer will make it easily audible.

The audio quality of this approach will already be poor.  If you're also going to use EQ, you should absolutely apply it BEFORE you encode.  Otherwise you will need to tolerate much lower quality or else even higher bitrates.

Yes, but, I did not say, that -45 dB, given by this approach, and -45 dB, given by WavPack hybrid, are of the same quality. Definately they are not. I wrote only about WavPack, which I had tested.

You seem to be looking for some form of holy grail of lossy audio encoding: great compression, zero artifacts, super simple algorithm. Many smart people have spent a lot of time and effort to give us good compression and few artifacts. But the algorithms involved usually aren't very simple.

No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime. Of course, every encoder has its own properties, so the way of the evaluation of the result should consider this properties.

Quote
It depends on what to call "transparent".
The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is since transparency specifically refers to specific combinations of listener and material, your pointing out how a codec that is usually transparent at much more sensible bitrates fails to be transparent at a very high bitrate with one particular sample does not support your argument: it’s actually undercutting it. There will always be exceptions to transparency, at least for certain people and certain signals, and none of your nice-sounding-in-novice-theory-but-baseless-in-practice ideas are likely to change that. At least develop a consistent narrative before you try to make everyone implement it at your behest.

I do not use the word "transparent" at all. I prefer audible/inaudible instead. Yes, my approach is that the difference should be inaudible for all humans (not dogs, cats, snakes etc.). We are humans, so there are restrictions of our perceptibility. If you do not hear the difference, it does not mean, that it is not there, and if the difference is there, it does not mean, that you (any human) can hear it. Audio listening is an objective thing. Usually people do not hear the difference because they are not attentive, patent etc. enough. They actually can do it, but silent mind is needed first.
In my opinion, for encoder there should not be any exceptions of input audio (when you try to substitute lossless). Otherwise, use Opus 208 kbps and be happy. It gives high quality for all types of music.

Are you saying that lossyWAV standard without noise shaping is transparent?

I can not say for sure. At 32 kHz sample it is audible, for 44 kHz and higher it is probably not, but deeper tests are needed. (with adaptive noise shaping 44 kHz is audible)
Quote
If I've understood you correctly, I think it's the closest thing you're going to get to your goal.

Yes, as far, as I tested it, it can be safely used instead of lossless.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #51
And the decision of how proper the result is, should be made by computer program at runtime.

So... where can I download this program?

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #52
No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime. Of course, every encoder has its own properties, so the way of the evaluation of the result should consider this properties.

Surely this could leave said program running indefinitely due to never matching the criteria? How do you define 'proper' for every possible type of audio?

I do not use the word "transparent" at all. I prefer audible/inaudible instead. Yes, my approach is that the difference should be inaudible for all humans (not dogs, cats, snakes etc.).

So you don't think that when someone says something is 'transparent' to them that they don't mean that all artifacts are 'inaudible' to them? I don't think you understand what transparency is.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #53
No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime.

But how does the external program determine whether the result is "proper"? Simple approaches will not be both near-transparent and efficient. You need something more complex to achieve both simultaneously. Neither simple nor complex approaches can guarantee transparency, as long as you are not exactly reproducing the input signal.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #54
I tried the old inversion trick on a drum & bass song and you can hear percussive elements; in my example you can clearly make out the snare hits. There seem to be no tonal elements.

On an a-capella song you hear broadband noise with a similar amplitude response as the original. "s" sounds in the original produce short bursts of high pitched noise in the difference file.

Sorry that I wasn't following this thread more closely and so missed this. What you are hearing is the dynamic noise shaping feature that measures high-frequencies in the source and tilts the spectral balance of the quantization noise (generated by the lossy mode) up or down in an attempt to have it more likely masked by the source audio. This gave a nice improvement to some samples where high-frequency transients would sometimes result in nasty bursts of low-frequency noise. It's very much like the adaptive noise shaping of LossyWAV, but simpler.

On the broader question, there are several operations and parameters in the lossy mode that I have added over the years (with lots of help from people with much better hearing than mine) to improve the transparency of the lossy mode, and they're all based on psychoacoustic principles, but that doesn't make it a psychoacoustic codec, IMO, because it doesn't implement any hearing model and it has no VBR mode wherein the bitrate is altered according to some estimate of perceptual quality.

In any event, it's not a purely mathematical operation like ADPCM either, so saying that it has a weak psychoacoustic model certainly would not bother me. 


An idea of audio encode algorithm, based on maximum allowed volume of

Reply #56
Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.

Kind of reductio ad infinitum: if we go along with this line of reasoning, we should say that what our hearing system transmits to our brain is a lossy version of the real soud event
...which it absolutely is...
Quote
and air molecules vibration is a sampled and quantized lossy reproduction of actual instrument's vibrations...
But, if a tree falls in the forest when nobody's there, does it make any noise?
I don't want to go that far.  I only care about what we can hear. Which is exactly what you said...

Quote
As I see it, sound, or better, music production is all about psychoacoustic and the reference model is always our hearing system, so arguing about bandwidth and SNR limits of CD format (*) means willing to (re)produce something that not only nobody could realistically hear, but that wasn't even in composer's or player's or instrument builder's mind in the first place!
My point was that, in a really esoteric discussion like this one, we have to be 100% clear what we mean by transparent, and what we mean by psychoacoustic model. It's a failure to understand these two things properly that lets the OP make some statements that many reading here will judge to be ridiculous.

If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.

Cheers,
David.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #57
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.
Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #58
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.
Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.
Ah, you see, I knew that would be the response, and fair enough. However, no where in the definition of transparency that I would expect to use does it say "actual music signals" - the idea is that, for some signal, at some bitrate, for some listener, using some equipment, in some usage scenario, the codec is transparent (i.e. indistinguishable from the original in a double-blind test) - and through time and generalised testing, the probability emerges that the codec is transparent for most signals/listeners/equipment/scenarios at a given bitrate. The concept of complete transparency (all signals/listeners/equipment/scenarios) is quite unattainable IMO - except for a mathematically lossless transformation. Yet the OP seems to want complete transparency, and thinks a computer programme is going to be able to judge when this is achieved.

HA is a fun place to make the argument about it not being actual music, and therefore unimportant. HA was born when mp3 couldn't encode certain artificial kinds of music very well at all. The inventors of mp3 probably didn't think it was actual music, and didn't know or care that strings of synthetic impulses (for example) were handled very badly by mp3.

Cheers,
David.

P.S. apparently the ability to hear 23-24kHz, at levels of around 80-100dB SPL, is quite widespread in younger people (e.g. under 25). Normal hi-fi can't reproduce it, and I can't imagine why anyone would want to listen to it - but then, some people would never want to listen to undial.wav, or Aphex Twin, or Merzbow, or...

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #59
Ah, you see, I knew that would be the response, and fair enough. However, no where in the definition of transparency that I would expect to use does it say "actual music signals" - the idea is that, for some signal, at some bitrate, for some listener, using some equipment, in some usage scenario, the codec is transparent […] The concept of complete transparency (all signals/listeners/equipment/scenarios) is quite unattainable IMO - except for a mathematically lossless transformation. Yet the OP seems to want complete transparency, and thinks a computer programme is going to be able to judge when this is achieved.
Good points. I was just defending CDDA as a musical medium since you obviously aren’t denying its suitability but it’s always possible that someone might take that quote wrongly.

Quote
HA is a fun place to make the argument about it not being actual music, and therefore unimportant. HA was born when mp3 couldn't encode certain artificial kinds of music very well at all. The inventors of mp3 probably didn't think it was actual music, and didn't know or care that strings of synthetic impulses (for example) were handled very badly by mp3.
Heh, also a good point. Again, I’m definitely not denying the utility of synthetic signals in increasing the technical quality of an encoder, and possibly in a way that transmits to more commonly audible material. It just helps to avoid placing too much emphasis on pure tones when, as I said, the ability to hear them need not reflect the actual lowpasses someone can discern in real material comprising complex waveforms, and also, pure tones aren’t likely to give encoders much trouble in comparison to synthetically concocted complex tones.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #60
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.
Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.


First, music is not by definition multiple tones at once.  It could be a single line melody, or even just rhythm on a single note.  Or a CD could have non musical audio.

Second, depending on masking to make it transparent takes it into the realm of lossy, which was the original point of this sub-topic.

Third, why limit the domain to people old enough to have presumably reduced hearing?






An idea of audio encode algorithm, based on maximum allowed volume of

Reply #61
First, music is not by definition multiple tones at once.  It could be a single line melody, or even just rhythm on a single note.  Or a CD could have non musical audio.
OK, then allow me to clarify what I hoped was clear but perhaps was badly worded: by multiple tones, I meant timbres more complex than single sinewaves.

Quote
Third, why limit the domain to people old enough to have presumably reduced hearing?
I have no desire to do this.

I was making some general and simplistic points about the ability to hear pure tones at a given frequency vs. that frequency’s relevance in common types of material containing multiple harmonics. I’m not trying to reshape how people develop codecs or claim that I know better. Developers are obviously free to test and process in whichever ways and on whichever types of material, ‘realistic’ or not, they choose. softrunner in particular might need some radical new methodologies to get this project off the ground…

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #62
It's a failure to understand these two things properly that lets the OP make some statements that many reading here will judge to be ridiculous.


No, what made the OP ridiculous is that it's proposing a "new" idea that not only ignores all advances that have been made in the past 40 years, but even shows misunderstanding of what was known 40-year ago. Hint: G.711 (mu-law/A-law) is 40 years old and even at that time it was known that the noise energy had to be modulated with the signal amplitude and that constant-level noise is a dumb idea.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #63
I think he implied a noise floor relative to peak level, or something. That would be NICAM.

Cheers,
David.

 

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #64
I think he implied a noise floor relative to peak level, or something. That would be NICAM.


Well, the OP mentions working in chunks of 1 second, which would be pretty useless for setting a relative floor. So far less advanced than mu-law (1972) and NICAM (which according to Wikipedia is from 1964). I guess that makes the idea worse than 50 year old technology. But that's OK, I've got a much better idea involving wax and needles 

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #65
First, music is not by definition multiple tones at once.  It could be a single line melody, or even just rhythm on a single note.  Or a CD could have non musical audio.
OK, then allow me to clarify what I hoped was clear but perhaps was badly worded: by multiple tones, I meant timbres more complex than single sinewaves.

I'm defintely with your idea of music, but to be sincere, after Cage's 4'33'' I'll not be that much surprised if someone will come out with, say, "6279000", a composition made of a single 23kHz tone.

And maybe someone else will rush to buy it on HD format...
... I live by long distance.