Vorbis technical discussion

Topic: Vorbis technical discussion (Read 5587 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Vorbis technical discussion

2002-06-20 09:55:10

Quote

Originally posted by HotshotGG

I wasn't trying to provide an in depth explenation or support for my claim that a transform coder will handle a fatboy signal better. When you say fatboy do you mean transients signals or stationary signals?

I mean fatboy. The clip. It's not just a problem case because there are some transients in it, but also because of the complex overall harmonical structure.

Quote

I would need more conclusive evidence which I don't have. I was trying to point out that theorically it might be possible to use adaptive transform coder and get results just as good. Not only through the use of just window switching, but through quite possibly preclamping the window or even through the use of a Hybrid Discrete Wavelet Transform which lives purely in the time domain, without having to revert to subbanding at all. I was just describing other ways it may be possible to get near transparent results without using subbanding, even though subbanding does give good results as it allows for good frequency resolution.

Note that this is very different from what you originally claimed:

Quote

As long as you have a fast enough window switching algorithm you can derive your adaptive transform coder from just a plain MDCT and get just as good results as having to revert to subbands.

I will not argue about what a transform coder using wavelets can do. There is very few research about using wavelets in audio coding, and noone has been successfull in doing it to date. Anything I would say would be pure speculation, and I think the same applies to your statements.

As Dibrom already pointed out, you have things again completely in reverse in your last sentence.

And what the hell is 'preclamping the window'?

--
GCP

Vorbis technical discussion

Reply #1 – 2002-06-20 10:56:44

Quote

As Dibrom already pointed out, you have things again completely in reverse in your last sentence.

ok, I was confused by the whole ordeal with subband coding then, I was reading somewhere that subband coding gives good frequency resolution as opposed to poor temporary resolution as you continiously split the subbands, unless that was adaptive transform coders they where speaking of it can get quite confusing sometimes.

Quote

what the hell is 'preclamping the window'?

I saw this on the Vorbis list archive a while back, once again a "possibility". I'll quote monty exactly what he said on this. This is what I head been referring to when I was speaking of preclamping the window.

"I'll not stray too far, but I'll mention that preliminary
experiments on using envelope pre-clamping alone to control pre-echo
(described later) produces results apparently as good as absurbly
small blocks. Although I thought of this a while ago, I only got to
try it recently because it seemed like a wild shot in the
dark. Unexpectedly, the results were very good. It's possible we
will be able to get away with a fixed block-size encoder with no
quality penalty!"

Quote

I don't buy the standardization argument either.

that was my whole point of adding the realm of obsecure technical possibilities to this post.

Vorbis technical discussion

Reply #2 – 2002-06-20 11:10:21

Quote

Originally posted by HotshotGG
I saw this on the Vorbis list archive a while back, once again a "possibility". I'll quote monty exactly what he said on this. This is what I head been referring to when I was speaking of preclamping the window. ???

"I'll not stray too far, but I'll mention that preliminary
experiments on using envelope pre-clamping alone to control pre-echo
(described later) produces results apparently as good as absurbly
small blocks. Although I thought of this a while ago, I only got to
try it recently because it seemed like a wild shot in the
dark. Unexpectedly, the results were very good. It's possible we
will be able to get away with a fixed block-size encoder with no
quality penalty!"

Ah, this is making more sense to me. I can imagine something with 'envelope pre-clamping'. How recent is the post from Monty? I'd like to know more about this.

--
GCP

Vorbis technical discussion

Reply #3 – 2002-06-20 12:07:17

Quote

How recent is the post from Monty? I'd like to know more about this.

Actually, this post is a few years old believe it or not. I had been looking at overview of Vorbis on the xiph.org archives and I stumbled accross it. I found it interesting and wanted to know more about it myself, not a bad idea indeed. Pre-clamping would be adding a series of 0's to the window correct if I am not mistaken? or it may be something else entirely different. I was trying to find some sort of correlation between that and what you said to me about knowing the diffrence between padding and impulse short blocks. I am not sure whether or not they are the same thing. By the way while I am asking you wouldn't happen to have insight about bitstream scaling a.k.a "bitrate peeling"? I left a question in the development forum of the Vorbis section, but nobody has any insight appearently. I thought you might know something about that. I was reading about it on archives, more insight would be good though.

Vorbis technical discussion

Reply #4 – 2002-06-20 13:24:35

Quote

Originally posted by HotshotGG

Actually, this post is a few years old believe it or not. I had been looking at overview of Vorbis on the xiph.org archives and I stumbled accross it. I found it interesting and wanted to know more about it myself, not a bad idea indeed. Pre-clamping would be adding a series of 0's to the window correct if I am not mistaken? or it may be something else entirely different.

I haven't heard of it before, but I can make an educated guess what the idea might have been.

The 'envelope' of an instrument is the rough shape of it's waveform. e.g. for a flute or something similar it would consists of two horizontal lines (mostly constant amplitude). For an instrument with an attack, it is two horizontal lines that almost overlap (near silence), then a point where they 'break apart' nearly vertically (the attack), followed by a slow convergence towards each other (decay). I'm not sure if you can imagine what I'm trying to explain if you are familiar with what an envelope is. (Maybe I should draw a graph, that would be clearer.)

Now, what preecho does, is introducing extra oscillations, before the attack, that weren't there in the original. If you would store the original envelope in the ogg file, the preecho would jump outside the original envelope. You could detect this and 'clamp' (i.e. limit) the preecho to be within the original envelope, thereby reducing it.

I don't know if this is what Monty had in mind, cos I only just thought it up

Quote

I was trying to find some sort of correlation between that and what you said to me about knowing the diffrence between padding and impulse short blocks. I am not sure whether or not they are the same thing.

Padding and impulse blocks are both short blocks. They are related to how window-switching works. When there is an attack, and the encoder needs to switch from the normal long to a short window and back, more than one short window will be produced (hybrid blocks for the transition, impulse spread over two short blocks, etc) Padding and impluse simply refer to the use that the short block has. If it is the one with the attack, we want to allocate a lot of bits to this block to reduce the preecho. If it's 'just' a padding block, this is not needed, and we can save some bits.

Quote

By the way while I am asking you wouldn't happen to have insight about bitstream scaling a.k.a "bitrate peeling"? I left a question in the development forum of the Vorbis section, but nobody has any insight appearently. I thought you might know something about that. I was reading about it on archives, more insight would be good though.

Bitrate peeling works by structuring the codebooks such that the data you store is layered (I believe the vorbis source calls this 'cascading', but you'd need to check with monty) in such a way that you can gradually remove parts and the quality of the file scales nicely with what you remove. Compare it to progressive encoding with JPEG.

As for the actual implementation details, I think Monty is the only person to understand that.

--
GCP

Vorbis technical discussion

Reply #5 – 2002-06-20 14:13:36

Quote

I haven't heard of it before, but I can make an educated guess what the idea might have been.

The 'envelope' of an instrument is the rough shape of it's waveform. e.g. for a flute or something similar it would consists of two horizontal lines (mostly constant amplitude). For an instrument with an attack, it is two horizontal lines that almost overlap (near silence), then a point where they 'break apart' nearly vertically (the attack), followed by a slow convergence towards each other (decay). I'm not sure if you can imagine what I'm trying to explain if you are familiar with what an envelope is. (Maybe I should draw a graph, that would be clearer

Not bad. I had originally pictured something like that in my mind, but not to that extent. I understand where your getting with your explenation a picture always works good though .

Quote

Now, what preecho does, is introducing extra oscillations, before the attack, that weren't there in the original. If you would store the original envelope in the ogg file, the preecho would jump outside the original envelope. You could detect this and 'clamp' (i.e. limit) the preecho to be within the original envelope, thereby reducing it.

everything seems pretty lucid here.

Quote

If it is the one with the attack, we want to allocate a lot of bits to this block to reduce the preecho. If it's 'just' a padding block, this is not needed, and we can save some bits.

I noticed a few impulse short blocks when I was decoding one of my Vorbis files in Winamp. I usually just casual watch watch the bit allocation sometimes as it's decoding and I noticed it went instantly from ~104 kbps all the way up to ~160 kbps in less than a few seconds as it detected a sharp transient forth coming in the bands and a lot of spectral energy, I would presume now that impulse short blocks where used on that section of stream. I also noticed after the transient region padding short blocks must have been it used, as it quickly dropped back down to ~96 kbps. On another note, I also have a track I encoded with RC3 that shows how quantizer in RC3 would virtually quantize noisy bands of spectrum down to 0 bits. The result is fairly obvious and I am eager to see how these new "noise normalization" techniques monty speaks of attempt to maintain by band energy during quantization.

Quote

Bitrate peeling works by structuring the codebooks such that the data you store is layered (I believe the vorbis source calls this 'cascading', but you'd need to check with monty) in such a way that you can gradually remove parts and the quality of the file scales nicely with what you remove. Compare it to progressive encoding with JPEG.

yes, that sounds about right. I couldn't figure out exactly what cascaded codebooks where meant for at first and then after I saw some stuff of bitstream scaling and how that works through packet truncation and how the packets need to changed or something of that nature to allow for that I figured that was what they where used for. I still need to get more insight on that though. Always good to take a look anyway.

Quote

As for the actual implementation details, I think Monty is the only person to understand that.

I know what your getting at here. I took a look at the anaylsis layer to tried and decipher it and by all means I think I am going to need to learn some more about
trigonometry.

Vorbis technical discussion

Reply #6 – 2002-06-20 14:26:11

Quote

Originally posted by HotshotGG

I noticed a few impulse short blocks when I was decoding one of my Vorbis files in Winamp. I usually just casual watch watch the bit allocation sometimes as it's decoding and I noticed it went instantly from ~104 kbps all the way up to ~160 kbps in less than a few seconds as it detected a sharp transient forth coming in the bands and a lot of spectral energy, I would presume now that impulse short blocks where used on that section of stream.

Impulse short blocks are used in any case where you have short blocks, i.e. on any transient. It sounds more likely to me that the increase of spectral energy heightened the bitrate demands.

A short block is only about 6 milliseconds long, so the bitrate jump should be instanteneous, not just 'less than a few seconds'. It is possible for the average bitrate to raise if there are lots in quick succession, though.

--
GCP

Vorbis technical discussion

Reply #7 – 2002-06-20 14:38:47

Actually, the bitstream jump was spontanous now that I think about it. It seemed as if it happened within a few seconds. The signal was an electronica synth with a very fast drum roll, sort of like OGPULSE if you have tested that sample only longer. Vorbis did a good job switching impulse short blocks though.

Sorry, for incovenience tying up MPEG-1 Layer 3 forum, i'll remember to start this question in the technical and quality oriented discussion for Vorbis next time.

Vorbis technical discussion

Reply #8 – 2002-06-20 15:54:26

Quote

Originally posted by HotshotGG
Sorry, for incovenience tying up MPEG-1 Layer 3 forum, i'll remember to start this question in the technical and quality oriented discussion for Vorbis next time.

Thread split and moved.

Vorbis technical discussion

Reply #9 – 2002-06-23 15:10:42

Next to testing the experimental idea of pre-clamping the window for a fixed-block size, I think monty had been planning on using pre-clamping as an alternative to window switching if there came a conflict with any patents on window switching that someone might bring up, which I doubt would ever be a problem.

Notice