Help - Search - Members - Calendar
Full Version: Question about filter order when restoring audio from a bad source
Hydrogenaudio Forums > Hydrogenaudio Forum > General Audio
Mekatype
I am about to start restoring a "Making Of" movie that came with an old game I own. The audio on this source is
extremely bad: an 11025Hz mono 8bit file. I intend to make it a 48000Hz stereo 16bit file, with some enhancement in between the processing. Therefore, here are the filterchain possibilities:
  1. Convert to stereo --> SSRC --> Enhancement
  2. Convert to stereo --> Enhancement --> SSRC
  3. SSRC --> Convert to Stereo --> Enhancement
  4. Enhancement --> Convert to stereo --> SSRC
  5. SSRC --> Enhancement --> Convert to stereo
  6. Enhancement --> SSRC --> Convert to stereo
I was wondering if you could tell me which way should theoretically be better and why. If it's any help, the SSRC line to be used is

CODE
ssrc.exe --rate 48000 --twopass --dither 1 --bits 16 --pdf 1 input.wav output.wav


the stereo conversion will be done using a VST plugin, and the enhancement will be done with Nero Wav Editor, first correcting the DC offset (is that necessary?) and then using the "add high frequencies" and "Megabass" Band Extrapolation settings.
Dynamic
Option 1 is what I'd choose, though practically there's little to choose between many.

Mono -> Stereo 11025 Hz -> Stereo 48000 Hz

(working at 48000 Hz produces fractionally less dither noise power per hertz than doing any mathematical processing on the 11025 Hz signal)

-> Enhancement (with re-dither assumed to be done at each processing step).

If really fussy you could use 24-bit PCM out of SSRC then down-convert to 16-bit after the Enhancement if 16-bit is necessary.

Personally, I'd be wary of artificially generating (guessing) high frequencies that must have been completely eliminated from the original by encoding that at 11025 Hz, but if it sounds OK to you, by all means go ahead.
Mekatype
QUOTE (Dynamic @ Mar 20 2007, 20:49) *
Option 1 is what I'd choose, though practically there's little to choose between many.

Mono -> Stereo 11025 Hz -> Stereo 48000 Hz

(working at 48000 Hz produces fractionally less dither noise power per hertz than doing any mathematical processing on the 11025 Hz signal)
...


Then wouldn't it be a better idea to do Mono -> Mono 48000 Hz -> Stereo 48000Hz so as to not do any processing at all on the 11025 signal?
Dynamic
There's very little in it either way. Mono -> Stereo simply involves duplicating the info in a second channel so doesn't require dither.

Your suggestion requires half as much of the upsampling and dithering algorithms so will be the faster of the two.

The dither noise when upsampling would be correlated if you convert to stereo afterwards but would be uncorrelated when upsampling after converting to stereo (assuming the dither signal is independent for each channel).

Uncorrelated noise powers are additive (rms amplitude is sqrt(2) = 1.41 times as great as one channel played alone, so noise power is 2 times as great), while the non-dither music signal is definitely fully correlated between left and right channels in this case.

Correlated noise amplitude, not power, is additive (giving 4-times the noise power) if you have the same path-length from speakers to ear or is something of a comb filter if you have different path-lengths. Dither noise in 16-bit audio is negligible, so there's really nothing in it, though technically, I'd slightly prefer uncorrelated dither noise.
Mekatype
QUOTE (Dynamic @ Mar 25 2007, 16:53) *
There's very little in it either way. Mono -> Stereo simply involves duplicating the info in a second channel
...


Remember that the stereo conversion I intend to do won't involve just making a dual channel mono file but rather using a VST plugin to actually create a difference between both channels.
Dynamic
Ah I didn't realise that when I replied.

Well, to be on the safe side, as you suggested resample the mono to 48 kHz first in case the stereo effect modifies the frequencies. Also, this ensures you have the greater bit-depth, which is another thing I hadn't noticed (only 8-bit source at 11025 Hz).

Personally I'd start with:
• 11025 Hz 8-bit mono
• as input to SSRC, outputting 48000 Hz 24-bit mono
(if your audio editor can input 24-bit, it's good practice to do all your processing at this resolution or better (preferably floating point), then dither down to 16-bit output only as the last stage, even though your 8-bit source means this is of negligible benefit in this case).
• then apply stereo effects and other plugins
• finally convert to destination bit-depth, 16-bit, with any dither type you like.
Mekatype
QUOTE (Dynamic @ Mar 27 2007, 13:34) *
Ah I didn't realise that when I replied.

Well, to be on the safe side, as you suggested resample the mono to 48 kHz first in case the stereo effect modifies the frequencies. Also, this ensures you have the greater bit-depth, which is another thing I hadn't noticed (only 8-bit source at 11025 Hz).

Personally I'd start with:
• 11025 Hz 8-bit mono
• as input to SSRC, outputting 48000 Hz 24-bit mono
(if your audio editor can input 24-bit, it's good practice to do all your processing at this resolution or better (preferably floating point), then dither down to 16-bit output only as the last stage, even though your 8-bit source means this is of negligible benefit in this case).
• then apply stereo effects and other plugins
• finally convert to destination bit-depth, 16-bit, with any dither type you like.


I see, doing the enhancement after converting to stereo should have the added benefit or further augmenting the difference between the two channels. Makes perfect sense, thanks for the explanations and for your patience smile.gif . Once this is done I'll post a few samples for a sort of mini-listening test, so keep checking this thread smile.gif.
Mekatype
Ok, first two samples are up. Please listen to them and tell me what differences (if any) you notice between them:

[links removed, new losslessly compressed set coming up soon]
Dynamic
Having just listened to the original 11.025 kHz, 8-bit, 2.wav it certainly sounds distorted (not grossly distorted but noticeably degraded) during voice segments, especially the female towards the end. It isn't badly clipped there (it is elsewhere).

If you have distortions it will be tough to disguise them and give the impression of higher quality.

(I didn't fancy waiting for 30 MB for 1.wav, though I might have downloaded something like a lame -V3 --vbr-new MP3 to assess the sound quality)
rsdio
In general, I'd say promote to 24-bit as the first step, maintain 24-bit throughout all intermediate processing steps, and dither only as the last step. You should probably do the sample rate conversion before synthesizing stereo, just in case there are some frequencies synthesized.

The only problem I see is your command-line which combines sample rate conversion and dithering in one step. You should only dither once, and only at the end of your processing. Can you use sample rate conversion without dithering in that command-line program? Can you use another sample rate conversion tool?

Note: I have examined sample rate conversion tools on the Mac, and a good SRC will create true 24-bit samples from a 16-bit source, so perhaps you should start with a first step of converting from 11.025 kHz to 48 kHz with 24-bit output - i.e. let the SRC be your 8-bit-to-24-bit conversion.
Mekatype
Dynamic: I did think about posting the samples in lossy format (Vorbis aoTuVb5 @ q6 to be exact), but since the files are so similar-sounding I figured it was best to keep it uncompressed. It was kinda silly to upload a WAV when it could have been easily compressed to FLAC. I'll post a new set of files soon. In the meantime though, what would you suggest me to do in order to eliminate the distortions and clipping?

rsdio: that's what I'm doing, both the SSRC line in the OP and SRCDrop do both things at once.
rsdio
QUOTE (Mekatype @ Apr 1 2007, 19:01) *
rsdio: that's what I'm doing, both the SSRC line in the OP and SRCDrop do both things at once.
I've never used SSRC, but I think the command line for what I am suggesting might be:
CODE
ssrc.exe --rate 48000 --twopass --dither 0 --bits 24 --pdf ? input8.wav mono24.wav
Then do your synthesized stereo via the VST plug and finally apply dither once.
Can SSRC apply dither without converting the sample rate?
CODE
ssrc.exe --rate 48000 --twopass --dither 1 --bits 16 --pdf ? stereo24.wav final16.wav

Sorry I don't know SSRC. I would typically do all of this within my plugin host, Logic Pro.
Dynamic
QUOTE (Mekatype @ Apr 2 2007, 03:01) *
Dynamic: I did think about posting the samples in lossy format (Vorbis aoTuVb5 @ q6 to be exact), but since the files are so similar-sounding I figured it was best to keep it uncompressed. It was kinda silly to upload a WAV when it could have been easily compressed to FLAC. I'll post a new set of files soon. In the meantime though, what would you suggest me to do in order to eliminate the distortions and clipping?

rsdio: that's what I'm doing, both the SSRC line in the OP and SRCDrop do both things at once.


Given that that source audio isn't high quality, fairly-transparent lossy (even much less than -q6, though for Vorbis, -q6 guarantees lossless stereo coupling) would still give a very good idea of the sound of the effects you've applied, which is all that matters. The original audio, being mono, 8-bit and 11.025 Hz is already fairly low bitrate (88 kbps), though lossless compression may help.

Regarding disabling dither in SSRC, if you're converting to 24-bit, dither noise power will be incredibly tiny and will be additive in raw power (not decibels) with each processing step, and theoretically you should always dither at every stage of processing (preferably flat, unshaped dither (--dither 1 --pdf 1) until the final stage, where you might choose shaped dither if you're reducing bit-depth to 16-bit), because the very tiny amount of noise (especially tiny at 24-bit) is a worthwhile tradeoff against truncation distortion, which is often tonal, and thus more noticeable than white dither noise, when it occurs. (I differ from Naoki Shibata's words in ssrc.txt here, but agree with Bob Katz at digido.com)

The distortions present in the original 11.025 kHz WAV are already present, will have created new spurious frequencies and can't be removed. There's barely any consecutive-sample clipping in the original WAV, so I presume you're referring to clipping introduced by your processing.

If your sample-rate conversion (SRC) produces clipping, which it probably does given that you have full-scale audio, you could apply a scaling factor to reduce the loudness a little and avoid clipping (SSRC does this automatically if you use the --twopass option).

As you're then going to apply further processing, this may increase peak amplitudes further, so I'd suggest that you can forget the --twopass option and instead apply a fixed attenuation, for example halving the amplitude using --att 6 to apply 6 dB attenuation in SSRC, which you're likely to find sufficient to avoid clipping. If you want to volume-match, you can normalise afterwards to get as close as possible to the original volume without clipping.

If you wish to end up with 16-bit audio, you should do that conversion as the final stage (SSRC can do this)
Mekatype
Ok, here are the two clips in APE format:

http://www.bestsharing.com/files/HrvWZf8258256/1.ape.html

http://www.bestsharing.com/files/XkJ9eG258238/2.ape.html

What I need you to do is listen to both and tell me if there are any noticeable differences between both. If either sounds wrong, then please suggest me what to do in order to fix or alleviate the issue. After that, I'll post a few stereo conversions so you can listen to them and tell me what you prefer.
rsdio
QUOTE (Mekatype @ Apr 7 2007, 12:42) *
Ok, here are the two clips in APE format:

Could you put these up in FLAC format? APE is really hard to decode on Mac OS X. There are a lot of audio people in the Mac community who could help you out with this question.
Mekatype
That was my first option, but FLAC FrontEnd kept giving me issues. Can you recommend me another GUI to do it?
pepoluan
You are using Windows, right? Soo foobar2000 is the best.
Mekatype
Loading the WAV, then selecting "Convert"-->"Convert to..." with these settings:



gets me this error message:

"Error writing to file (Encoder has terminated prematurely with code 1; please re-check parameters) : file://C:\2.flac"

What's going on?
Mekatype
Anyone?
rsdio
QUOTE (Mekatype @ Apr 13 2007, 07:34) *
Anyone?
Just use the command-line flac. You only have two files to compress, so it shouldn't take too long to manually run flac a couple of times. I don't use Windows, and I don't even use the GUI front-ends for flac on Mac OS, so I have no hints to offer.

If that doesn't work, just upload your files uncompressed. I'd listen to them, because I'm very curious about your project to upconvert the low-fi original, but I can't listen until they're in a format that's available everywhere.
Mekatype
Will do, thanks for the suggestion.
Mekatype
Done. I had to use the --lax switch and resample to only 24bit but it finally was done, here are the files:

http://www.bestsharing.com/files/ITw3NaJ262924/1.flac.html
http://www.bestsharing.com/files/lOC0zZq262901/2.flac.html

Remember: the idea is that you listen to both files and tell me if there's any difference between the two, then suggest me some form of processing besides the one I plan to do. Maybe this section of the thread should be moved to listening test subforum?
Mekatype
Thread posted. You can find it here:

http://www.hydrogenaudio.org/forums/index....showtopic=54255
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.