IPB

Welcome Guest ( Log In | Register )

enhanced aac+ to aac lc
wind
post Nov 2 2012, 09:53
Post #1





Group: Members
Posts: 10
Joined: 1-November 12
Member No.: 104229



Hi, everyone, has there anybody used enhanced aac+ code(TS 26.410) to encode/decode only aac lc ,without sbr?

I modified the encoder of the enhanced aac+, ,there is no sbr, then when the decoder decode the new .3gp file ,no sbr is decoded, but the outout sample rate is 1/2 downsampled, because aac lc works at 1/2 sample rate, the sbr works at original sample rate, and in the decoder ,it is sbr make it become to original sample rate, but now there is no sbr.
Does anyone know how to make the sample rate of decoder output same as the original input file sample rate, i tried to modify the encoder to encode at original sample rate,but there will be no sound, and if i use 1/2 sample rate .3gp & modify the decoder to change the sample rate and frame size, the sound will sound strange...
or
is that ok if i upsample the new decoder output file (1/2 sample rate ) through matlab?

Thanks in advance.

This post has been edited by wind: Nov 2 2012, 10:29
Go to the top of the page
+Quote Post
 
Start new topic
Replies
Dynamic
post Nov 5 2012, 19:30
Post #2





Group: Members
Posts: 793
Joined: 17-September 06
Member No.: 35307



Thats' from the 3gpp code isn't it? I think the intention of spline_resampler.c is to allow handsets whose DACs don't support the rate of a received file to downsample to a supported rate.

As you want to upsample, I'm not sure this code is directly applicable, but I haven't looked into it.

With downsampling, you need to filter first to remove frequencies above the Nyquist limit, then interpolate to the new, lower sampling rate.
With upsampling, you need to interpolate to the new higher sampling rate then filter afterwards to remove any frequencies that have been introduced above the Nyquist limit of the lower rate.
In theory, which ever way you're going between the same pair of sampling rates (up or down), the cut-off frequency should be the same, assuming an ideal filter.

The speex resampler code looks useful, seems to have a liberal license, works for arbitrary rates, and it implements upsampling intelligently, in that it recognises that the filter design for downsampling must ensure good attenuation at and above the Nyquist limit, but for upsampling, the content is already low on content very close to the Nyquist limit and zero above it until you introduce aliasing by your chosen method of interpolation, so they can be more relaxed about the attenuation close to the limit and preserve audio frequencies better by choosing a slightly higher cut-off frequency when upsampling than they do when downsampling. It also offers FIXED POINT or FLOATING POINT versions, which you can choose depending on your hardware, and I believe it has been tested when compiled for numerous popular platforms (certainly the Opus source code which includes the same resampler has been tested very widely prior to IETF standardization)

The speex one calculates the sinc function on the fly, calculates the cut-off mathematically but has a number of Kaiser window functions pre-calculated in the source code, but it includes some values for adjusting the filter cut-off frequency for upsampling versus downsampling.

It can essentially be treated as a black box that just does the job without having to understand how.

(P.S. That's the right SoX project you linked to a few posts above, and their resampling code has been implemented in a fb2k plugin which you mentioned in the other link. Some people get very picky about inaudible differences that can show up on graphs, where SoX resampler performs very well. I doubt there's an audible difference from fb2k's PPHS resampler or speex's for normal sampling rates. I guess there's a modest chance of slight audibility when upsampling from very low sample rates such as 8kHz.)
Go to the top of the page
+Quote Post
wind
post Nov 6 2012, 00:02
Post #3





Group: Members
Posts: 10
Joined: 1-November 12
Member No.: 104229



QUOTE (Dynamic @ Nov 5 2012, 19:30) *
Thats' from the 3gpp code isn't it? I think the intention of spline_resampler.c is to allow handsets whose DACs don't support the rate of a received file to downsample to a supported rate.

As you want to upsample, I'm not sure this code is directly applicable, but I haven't looked into it.

With downsampling, you need to filter first to remove frequencies above the Nyquist limit, then interpolate to the new, lower sampling rate.
With upsampling, you need to interpolate to the new higher sampling rate then filter afterwards to remove any frequencies that have been introduced above the Nyquist limit of the lower rate.
In theory, which ever way you're going between the same pair of sampling rates (up or down), the cut-off frequency should be the same, assuming an ideal filter.

The speex resampler code looks useful, seems to have a liberal license, works for arbitrary rates, and it implements upsampling intelligently, in that it recognises that the filter design for downsampling must ensure good attenuation at and above the Nyquist limit, but for upsampling, the content is already low on content very close to the Nyquist limit and zero above it until you introduce aliasing by your chosen method of interpolation, so they can be more relaxed about the attenuation close to the limit and preserve audio frequencies better by choosing a slightly higher cut-off frequency when upsampling than they do when downsampling. It also offers FIXED POINT or FLOATING POINT versions, which you can choose depending on your hardware, and I believe it has been tested when compiled for numerous popular platforms (certainly the Opus source code which includes the same resampler has been tested very widely prior to IETF standardization)

The speex one calculates the sinc function on the fly, calculates the cut-off mathematically but has a number of Kaiser window functions pre-calculated in the source code, but it includes some values for adjusting the filter cut-off frequency for upsampling versus downsampling.

It can essentially be treated as a black box that just does the job without having to understand how.

(P.S. That's the right SoX project you linked to a few posts above, and their resampling code has been implemented in a fb2k plugin which you mentioned in the other link. Some people get very picky about inaudible differences that can show up on graphs, where SoX resampler performs very well. I doubt there's an audible difference from fb2k's PPHS resampler or speex's for normal sampling rates. I guess there's a modest chance of slight audibility when upsampling from very low sample rates such as 8kHz.)

Yes, that is from the 3GPP code.
Thank you so much, Dynamic, I can learn something from your reply.
How about the 'mono' question,do you know have to get the mono decoder output file ?
I set the CT mono debug mode, but it seems everytime when the program goes to
'interleaveSamples(&TimeDataFloat[0],&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);'
the 'numChannels' will change from 1 to 2, but if I let the numchannels keep 1, I will get the mono output ,but the sound will sounds totally wrong...
I choose the 96kbps-mono.I am confused.
Looking forward to your reply.
Go to the top of the page
+Quote Post
Dynamic
post Nov 6 2012, 19:20
Post #4





Group: Members
Posts: 793
Joined: 17-September 06
Member No.: 35307



QUOTE (wind @ Nov 5 2012, 23:02) *
How about the 'mono' question,do you know have to get the mono decoder output file ?
I set the CT mono debug mode, but it seems everytime when the program goes to
'interleaveSamples(&TimeDataFloat[0],&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);'
the 'numChannels' will change from 1 to 2, but if I let the numchannels keep 1, I will get the mono output ,but the sound will sounds totally wrong...
I choose the 96kbps-mono.I am confused.
Looking forward to your reply.


I thought the whole idea of the special mono mode was this:

If the original encode was stereo, the AAC-LC part (low frequencies) must be decoded as stereo and downmixed to mono. However, you can save computational resources with the SBR layer by downmixing the components before conducting the band replication.

Pure speculation but I wonder if what you're doing by forcing numChannels to 1 is over-riding the initial stereo decode of the LC layer (whose stereo information could be encoded in various ways, such as L-R or M-S stereo for example) and possibly you occasionally get the M and occassionally get the L, for example, depending on what was chosen for each frame.

I'm not really familiar with the 3GPP code to tell what the problem is. If you've already stripped away the SBR layer, it might be totally useless to specify mono decoding, as it only does anything different in the SBR layer (which you've discarded), and you're best to simply decode as stereo and downmix using any of the usual formulae such as mono=(L+R)/2 or mono=(L+R)/sqrt(2).
Go to the top of the page
+Quote Post
wind
post Nov 7 2012, 17:30
Post #5





Group: Members
Posts: 10
Joined: 1-November 12
Member No.: 104229



QUOTE (Dynamic @ Nov 6 2012, 19:20) *
QUOTE (wind @ Nov 5 2012, 23:02) *
How about the 'mono' question,do you know have to get the mono decoder output file ?
I set the CT mono debug mode, but it seems everytime when the program goes to
'interleaveSamples(&TimeDataFloat[0],&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);'
the 'numChannels' will change from 1 to 2, but if I let the numchannels keep 1, I will get the mono output ,but the sound will sounds totally wrong...
I choose the 96kbps-mono.I am confused.
Looking forward to your reply.


I thought the whole idea of the special mono mode was this:

If the original encode was stereo, the AAC-LC part (low frequencies) must be decoded as stereo and downmixed to mono. However, you can save computational resources with the SBR layer by downmixing the components before conducting the band replication.

Pure speculation but I wonder if what you're doing by forcing numChannels to 1 is over-riding the initial stereo decode of the LC layer (whose stereo information could be encoded in various ways, such as L-R or M-S stereo for example) and possibly you occasionally get the M and occassionally get the L, for example, depending on what was chosen for each frame.

I'm not really familiar with the 3GPP code to tell what the problem is. If you've already stripped away the SBR layer, it might be totally useless to specify mono decoding, as it only does anything different in the SBR layer (which you've discarded), and you're best to simply decode as stereo and downmix using any of the usual formulae such as mono=(L+R)/2 or mono=(L+R)/sqrt(2).

Thanks, Dynamic.
I'm sorry I forgot to mention what i tried is that the original encode is mono, and the decoder debug mode is also mono. There is no need to do downmix.
Go to the top of the page
+Quote Post

Posts in this topic


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 16th April 2014 - 14:53