Help - Search - Members - Calendar
Full Version: Adaptive Noise Shaping (ANS)
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
SebastianG
Hello fellow HA members.

I developed something that I currently call ANS. It's basically a tool for developers in form of a little library that helps designing noise shaping filters and contains the filter implementation itself as a noise shaper. I had the idea for ages but never got motivated enough to actually try. Since lossyWAV seems to draw enough attention and could also benefit from ANS I started coding this library.

The primary purpose of this thread is to draw other developers' attentions to this and discuss its possible uses. This could turn out as another motivator to further the project. The uses I can think of are
  • high rate steganography
  • improved lossyWAV
  • improved lossy-only mode of WavPack
  • any other "transform-less" codec
But this lib would only be a small building block that's needed for these applications. It doesn't contain a psychoacoustic model that tells the filter design code what the desired PSD of the quantization noise actually is. IMHO such a model could be another library possibly outsourced from another lossy encoder (i.e. mppenc, libVorbis, LAME...). So, if there's any interest in the afore-metioned applications or similar ones I suggest the next problem that should be tackled is a psychoacoustic model. Maybe there's some skilled developer out there that is already familar with the related code of one the mentioned projects and is willing to help out here.

To be honest I do think transform-less lossy codecs have their purpose and fill a small gap between something like MP3 and FLAC. It's the "I think lossless coding is a waste of space but I'd like to have enough headroom"-gap. Being "transform-less" of course has its advantages and disadvantages like lower algorithmic delay, lower decoding complexity, worse diagonalization. The lossyWAV application also has its advantages and disadvantages compared to a dedicated codec like WavPack's lossy mode. But they all could share the same psychoacoustic model and noise shaping code.

Joining forces is probably the best thing to do as many people's sparetime is limited.

Opinions?

Edit: Technical details are following:
I decided to try a frequency-warped all-pole filter in its lattice structure which is what Edler & Schuller used for their "new paradigm codec". The rationale behind this is that all pole filters are easily designed without major spectral distortions in areas where the response is higher. Frequency-warping is used to exploit the fact that at lower frequencies the masking curve usually changes more quickly with respect to frequency than at higher frequencies (wider critical bands). The lattice structure is especially suited for interpolating filter parameters. The difference to what Edler & Schuller described is that this library implements the filter as a noise shaper instead of a pre/post filter pair.


Cheers,
SG
SebastianG
Let me just show you an example of how it can be used and what it does:
CODE

#include <iostream>
#include <cmath>

#include "fdesign.hpp"
#include "lattice.hpp"

using std::cout;
using std::endl;

/**
* "PSD" function callback (power spectrum density) for testing
* (will be non-uniformly sampled due to warping!)
*/
double my_psd(double freq) {
    // frequency is in radians (pi = nyquist frequency)
    freq /= 3.14159265;
    double dB;
    if (freq<=0.125)
        dB = 0;
    else if (freq>=0.250)
        dB = 20;
    else {
        freq = (freq-0.125)/0.125;
        freq = std::sin(freq*3.14159265/2);
        freq *= freq;
        dB = freq * 20;
    }
    // 0 dB is the reference noise level you'd
    // expect without noise shaped word length
    // reduction
    return std::pow(10,dB/10.0);
}

int main()
{
    const int fftlen = 512; // should be enough due to warping
    const int order = 16; // filter order
    const double lambda = 0.5; // warping parameter

    // initialize "filter design engine" ...
    filter::filt_design<double> fide (fftlen,order,lambda);

    // construct filter object ...
    filter::wapl_filter<double> filt (order,lambda);

    double gain = fide.design(&my_psd,filt.get_roi());
    cout << endl << "gain = " << gain << ';' << endl;
    double bits2remove = std::log(gain) / std::log(2);
    cout << "bits2remove = " << bits2remove << ';' << endl;

    cout << endl << "impulse_response = [";
    // The value of 'next_error' would be the actual rounding error:
    double next_error = 1.0;
    for (int k=0; k<50; ++k) {
        double sig = 0.0; // signal = silence
        double qqq = sig + filt.get_feedback() + next_error;
        double noise = qqq - sig;
        cout << ' ' << noise;
        // The filter needs to know the current
        // 'filtered error' sample:
        filt.update(noise);
        // no errors from now on so we'll get the noise
        // shaper's impulse response printed out ...
        next_error = 0.0;
    }
    cout << " ];" << endl << endl;
}

"my_psd" is the function that needs to get replaced with the masking threshold results of a proper psychoacoustic model.

The output is:
CODE

gain = 6.49398;
bits2remove = 2.6991;

impulse_response = [ 1 -0.811359 -0.338102 -0.0113227 0.153058 0.178053 0.118035 0.021619 -0.048347 -0.0844135 -0.0746955 -0.0326362 0.0114296 0.0382008 0.0436505 0.0309537 0.00762708 -0.0153332 -0.0274405 -0.0243924 -0.0102578 0.00585966 0.0156875 0.0159327 0.00875903 -0.000658596 -0.00742622 -0.00918076 -0.0064888 -0.00175143 0.00240633 0.00435756 0.00388796 0.00188158 -0.000365576 -0.00181118 -0.00204268 -0.00128425 -0.000127748 0.00082474 0.0012263 0.0010647 0.000570949 4.66137e-05 -0.000288213 -0.000363705 -0.000240982 -4.2021e-05 0.000122792 0.000196545 ];


Do copy&paste to a Matlab/Octave shell and then type "freqz(gain.*impulse_response);" to see how well the magnitude responses matches the "my_psd" function.
jmvalin
QUOTE(SebastianG @ May 26 2008, 22:33) *

To be honest I do think transform-less lossy codecs have their purpose and fill a small gap between something like MP3 and FLAC. It's the "I think lossy coding is a waste of space but I'd like to have enough headroom"-gap. Being "transform-less" of course has its advantages and disadvantages like lower algorithmic delay, lower decoding complexity, worse diagonalization. The lossyWAV application also has its advantages and disadvantages compared to a dedicated codec like WavPack's lossy mode. But they all could share the same psychoacoustic model and noise shaping code.

Edit: Technical details are following:
I decided to try a frequency-warped all-pole filter in its lattice structure which is what Edler & Schuller used for their "new paradigm codec". The rationale behind this is that all pole filters are easily designed without major spectral distortions in areas where the response is higher. Frequency-warping is used to exploit the fact that at lower frequencies the masking curve usually changes more quickly with respect to frequency than at higher frequencies (wider critical bands). The lattice structure is especially suited for interpolating filter parameters. The difference to what Edler & Schuller described is that this library implements the filter as a noise shaper instead of a pre/post filter pair.


Personally, I've got my doubts regarding the use of linear prediction (any "transform-less" codec using something else) for anything other than speech and low-quality codecs (non pejorative -- I consider Speex as low-quality in the sense that it can never be transparent at full bandwidth). I'm still trying to find any samples encoded with Schuller et al.'s LP-based low-delay codec to see myself what kind of artefacts it has. One sure thing though is that linear prediction can be a pain to get right (and stable!). I've always thought warping just made things worse. Is that what you observed when implementing that?

Also, I'm not quite sure I understand what your software does if it doesn't compute the masking curve? it creates the LP filter from a curve?
SebastianG
As far as I know there are two different but related codecs by the Fraunhofer people. They both use the pre/post filter idea but one of them also uses a transform (see paper I linked to) instead of LPC (see their "ULD" version) to decorrelate.

If you focus on higher rates and quality levels (enough headroom between masking threshold and quantization noise) I think the following approach isn't such a bad idea:

QUOTE(jmvalin @ May 30 2008, 08:19) *

One sure thing though is that linear prediction can be a pain to get right (and stable!). I've always thought warping just made things worse. Is that what you observed when implementing that?

Consider FLAC: The quantized LPC filter coefficients might even correspond to an unstable synthesis filter. But it won't go crazy because the decoder exactly reverses what the encoder did in a deterministic manner. This could also be the base for a lossy codec where the residual is quantized to some other integer samples. This quantization error is accounted for during the prediction for the next sample which makes the quantiuzation noise white. Doing LPC "not right" here only reduces compression efficiency. I don't really see the need to apply any warping during linear prediction.

What's left is to move the quantization noise to places where it doesn't bother people. This problem can be solved independently by noise shaping. Note: linear prediction and noise shaping are orthogonal in this case. They don't interfere. This is a good thing because the decoder doesn't need to care about noise shaping filters at all. Noise shaping wouldn't be part of a format specification. You could even write an encoder that outputs a valid FLAC stream.

QUOTE(jmvalin @ May 30 2008, 08:19) *

Also, I'm not quite sure I understand what your software does if it doesn't compute the masking curve? it creates the LP filter from a curve?

Yes. I mentioned the rationale behind frequency warping. This isn't something I came up with (Edler et al, A. Härmä, ...). I merely applied it to the noise shaping filter case. Edler et al used it as pre/post filter pair where the decoder needs to apply the "post-filter" as a final step. If you do "frequency warping right" there are no stability issues. At least I don't see why there should be any. The above sample code already shows all its features. It can currently "only" design and apply the filters. What makes it non-trivial IMHO is the frequency warping part. If you take the lattice all-pole filter structure and replace all the deley elements with all-pass filters you'll get delay-free loops. The filter implementation solves this problem in a way so that it can still be used as a noise shaping filter (see sample code). The implementation should be as stable as a non-warped filter because I didn't change the filter's lattice structure. The filter design part (computation of reflection coeffs) is of course also affected by frequency warping. A function representing the desired response is nonuniformly sampled, a warped autocorrelation is computed via FFT and used as input for the Durbin-Levinson recursion. So, it's a rather small building block (less than 500 lines of code actually). Though, I consider it to be enough black magic which can we well hidden behind an easy interface for everyone to use who likes to toy with adaptive noise shaping.

Cheers,
SG
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.