Help - Search - Members - Calendar
Full Version: How can I get Lame to encode 64kbps speech as well as mp3sEncoder? (sa
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - General
woudy
Hi Folks,

I've found Lame is creating this glaring warbling/sparkling artifact in 64kbps mono speech that is absent when using Fraunhofer's mp3sEncoder. Can someone tell me what's wrong with Lame (or my settings) in this case?

The particulars:

Input sample: http://minidisc.org/mp3tests/jg_test.wav (44.1khz stereo speech, 0:30s)

FhG output: http://minidisc.org/mp3tests/jg_test_FhG.mp3
Produced using: mp3sEncoder.exe -if jg_test.wav -of jg_test_FhG.mp3 -br 64000 -q 1 -m 0 -mono
Labelled: Fraunhofer IIS MP3 Surround Commandline Encoder V1.4, Encoder-Library V04.01.00 (build 2007-05-18)
Downloaded from: http://www.all4mp3.com/tools/sw_fhg_cl.html

Lame output: http://minidisc.org/mp3tests/jg_test_lame_64kbps.mp3
Produced using: lame -b 64 -h -m m jg_test.wav jg_test_lame_64kbps.mp3
Labelled: LAME version 3.96.1 (http://lame.sourceforge.net/)

Just listen to these two samples with headphones on, the difference is quite stark. It's like there's simply some bug in the Lame encoder.

Thank you for your help.

Eric Woudenberg

Skylined ;)~
exactly which version/compile of LAME are you using?
Gabriel
First, why not trying Lame 3.97, which was released nearly 1 year ago, in order to check if your problem is still there with current version? (3.96.1 was released 3 years ago)
Alex B
Thanks woudy, you have provided a very interesting speech sample. The male voice has a low base tone starting from 80 Hz, but suprisingly the high harmonics reach up to the 16-22 kHz range. In addition some microphone handling noises produce frequency content down to 10 Hz. Actually, just by looking a frequency spectrum display one could think that it is an instrumental music sample.

I noticed that LAME 3.97 @ --preset 64 -m m produces slightly better quality than your 3.96.1 sample has, but only slightly.

LAME uses a 16.5 kHz lowpass filter for 64 kbps CBR and ABR mono files. The FhG sample has an about 13.5 kHz lowpass filter applied so I tried a similar lowpass setting with LAME 3.97, which improved quality a bit further. This setting triggers LAME to use a 32 kHz sample rate which seem to be better with this sample.

After that I tried LAME 3.98b4 @ --preset 60 -m m --lowpass 13.5 and, to my surprise, noticed that it is worse than LAME 3.97. 3.98b4 produced a higher bitrate with the same ABR setting so I had to adjust the setting slightly.

After that I did a proper ABC-HR test:

1. Lame 3.97 --preset 64 -m m --lowpass 13.5 (64 kbps, 241 kB)
2. your FhG sample
3. Lame 3.98 beta 4 --preset 60 -m m --lowpass 13.5 (64 kbps, 241 kB)

I decoded the samples with foobar2000 0.9.4.3 and after that converted them to 44.1 kHz stereo with Adobe Audition 2.0 using the best quality resampling setting.

CODE
ABC/HR for Java, Version 0.5b, August 11, 2007
Testname: speech

Tester: Alex B

1L = U:\test\LAME speech problem\jg_test L3.97 lp13.5.wav
2L = U:\test\LAME speech problem\jg_test_FhG.wav
3R = U:\test\LAME speech problem\jg_test L3.98 lp13.5.wav

---------------------------------------
General Comments:
---------------------------------------
1L File: U:\test\LAME speech problem\jg_test L3.97 lp13.5.wav
1L Rating: 3.5
1L Comment:
---------------------------------------
2L File: U:\test\LAME speech problem\jg_test_FhG.wav
2L Rating: 4.5
2L Comment:
---------------------------------------
3R File: U:\test\LAME speech problem\jg_test L3.98 lp13.5.wav
3R Rating: 2.5
3R Comment:
---------------------------------------

ABX Results:
Original vs U:\test\LAME speech problem\jg_test_FhG.wav
8 out of 8, pval = 0.0030
Original vs U:\test\LAME speech problem\jg_test L3.97 lp13.5.wav
8 out of 8, pval = 0.0030
Original vs U:\test\LAME speech problem\jg_test L3.98 lp13.5.wav
8 out of 8, pval = 0.0030


---- Detailed ABX results ----
Original vs U:\test\LAME speech problem\jg_test_FhG.wav
Playback Range: 23.699 to 24.870
10:45:35 AM p 1/1 pval = 0.5
10:46:04 AM p 2/2 pval = 0.25
10:46:43 AM p 3/3 pval = 0.125
10:46:55 AM p 4/4 pval = 0.062
10:48:09 AM p 5/5 pval = 0.031
10:49:23 AM p 6/6 pval = 0.015
10:49:50 AM p 7/7 pval = 0.0070
10:52:40 AM p 8/8 pval = 0.0030

Original vs U:\test\LAME speech problem\jg_test L3.97 lp13.5.wav
Playback Range: 01.757 to 04.325
10:04:34 AM p 1/1 pval = 0.5
Playback Range: 01.757 to 02.990
10:05:18 AM p 2/2 pval = 0.25
Playback Range: 01.757 to 03.262
10:05:53 AM p 3/3 pval = 0.125
10:06:01 AM p 4/4 pval = 0.062
10:06:23 AM p 5/5 pval = 0.031
10:06:36 AM p 6/6 pval = 0.015
10:06:56 AM p 7/7 pval = 0.0070
10:07:10 AM p 8/8 pval = 0.0030

Original vs U:\test\LAME speech problem\jg_test L3.98 lp13.5.wav
Playback Range: 24.141 to 25.120
10:27:40 AM p 1/1 pval = 0.5
10:27:46 AM p 2/2 pval = 0.25
10:27:53 AM p 3/3 pval = 0.125
10:28:01 AM p 4/4 pval = 0.062
10:28:28 AM p 5/5 pval = 0.031
10:28:46 AM p 6/6 pval = 0.015
10:29:30 AM p 7/7 pval = 0.0070
10:29:40 AM p 8/8 pval = 0.0030


Lame 3.98b4 shows clear regression when compared with LAME 3.97.

FhG is very very good with this sample. I had great difficulties ABXing it, though I succeeded after a considerable amount of practicing.

In addition, I tried quickly Vorbis b5 @ -q 0.9 mono (65 kbps) and it was roughly on par with FhG, both encoders produce fine results with this sample.


Gabriel, I think this sample could be useful for the LAME development.


EDIT

I forgot to mention that I tried also LAME 3.97 CBR at 64 kbps, but obviously ABR was slightly better so I didn't bother to include LAME CBR in my ABC-HR test.
haregoo
At low birate speech encoding, VBR helps a lot. I'd recommend -V 5(to 8) --vbr-new with LAME v3.97.
kjoonlee
Using the original sample, I can detect worse artifacts in Speex 1.2 beta 1, at 44.1 kHz and 32 kHz. Curiously, the results are better when the source is resampled to 16 kHz.
woudy
QUOTE(Alex B @ Aug 11 2007, 04:59) *
1L File: U:\test\LAME speech problem\jg_test L3.97 lp13.5.wav
1L Rating: 3.5

2L File: U:\test\LAME speech problem\jg_test_FhG.wav
2L Rating: 4.5

3R File: U:\test\LAME speech problem\jg_test L3.98 lp13.5.wav
3R Rating: 2.5

Alex, thanks so much for your very thorough investigation, I really appreciate it!
Alex B
Naturally those ratings are just my opinion, but the obvious artifacts were clearly more pronounced in the LAME 3.98b4 sample. The FhG sample didn't have any obvious artifacts.

We must remember that this was only one sample & only one listener & only certain settings, so my test does not prove anything about the possible quality differences in general. However, I hope the LAME developers can find out why the newer beta version produced lower quality in this case.

Encspot Pro tells the following about the samples:

FhG
Long blocks: 98.3% - Short blocks: 1.7%
CODE
Bitrates:
----------------------------------------------------
64 |||||||||||||||||||||||||||||||||||||||| 99.9%
----------------------------------------------------

Type : mpeg 1 layer III
Bitrate : 64
Mode : mono
Frequency : 44100 Hz
Frames : 1170
ID3v2 Size : 0
First Frame Pos : 0
Length : 00:00:30
Max. Reservoir : 511
Av. Reservoir : 380
Emphasis : none
Scalefac : 7.6%
Bad Last Frame : no
Encoder : FhG (ACM or producer pro)
Lame Header : No


LAME 3.97
Long blocks: 86.6% - Short blocks: 13.4%
CODE
Bitrates:
----------------------------------------------------
32 | 1.4%
48 |||| 6.1%
56 |||||||||||||||||||||||||||||||||||||||| 50.4%
64 ||||||||||||||||| 22.2%
80 ||||||| 10.0%
96 ||||| 6.8%
112 | 1.8%
128 0.6%
160 0.6%
----------------------------------------------------

Type : mpeg 1 layer III
Bitrate : 64
Mode : mono
Frequency : 32000 Hz
Frames : 851
ID3v2 Size : 0
First Frame Pos : 0
Length : 00:00:30
Max. Reservoir : 511
Av. Reservoir : 26
Emphasis : none
Scalefac : 15.3%
Bad Last Frame : no
Encoder : Lame 3.97

Lame Header:

Quality : 57
Version String : Lame 3.97
Tag Revision : 0
VBR Method : abr
Lowpass Filter : 13500
Psycho-acoustic Model : nspsytune
Safe Joint Stereo : no
nogap (continued) : no
nogap (continuation) : no
ATH Type : 4
ABR Bitrate : 64
Noise Shaping : 2
Stereo Mode : Mono
Unwise Settings Used : no
Input Frequency : 44.1kHz


LAME 3.98b4
Long blocks: 86.6% - Short blocks: 13.4%
CODE
Bitrates:
----------------------------------------------------
32 | 1.9%
40 0.6%
48 |||||||| 12.5%
56 |||||||||||||||||||||||||||||||||||||||| 57.1%
64 |||| 7.1%
80 ||||| 8.1%
96 |||| 6.6%
112 | 2.7%
128 | 1.6%
160 | 1.8%
----------------------------------------------------

Type : mpeg 1 layer III
Bitrate : 64
Mode : mono
Frequency : 32000 Hz
Frames : 851
ID3v2 Size : 0
First Frame Pos : 0
Length : 00:00:30
Max. Reservoir : 511
Av. Reservoir : 26
Emphasis : none
Scalefac : 25.4%
Bad Last Frame : no
Encoder : Lame 3.98 (beta)

Lame Header:

Quality : 57
Version String : Lame 3.98 (beta)
Tag Revision : 0
VBR Method : abr
Lowpass Filter : 13500
Psycho-acoustic Model : nspsytune
Safe Joint Stereo : no
nogap (continued) : no
nogap (continuation) : no
ATH Type : 4
ABR Bitrate : 60
Noise Shaping : 2
Stereo Mode : Mono
Unwise Settings Used : no
Input Frequency : 44.1kHz


The bitrate distrubution is quite different between the LAME versions so obviously something has changed.

Also, I wonder if the short block usage has something to do with the quality differences. LAME uses a lot more short blocks.
robert
If the usage of more short blocks is a problem or not, you can rule out by comparing it with 3.98b4 -V[7,6,5] -mm --lowpass 13.5.
Alex B
QUOTE(haregoo @ Aug 11 2007, 14:26) *
At low birate speech encoding, VBR helps a lot. I'd recommend -V 5(to 8) --vbr-new with LAME v3.97.

The general consensus at HA has been that ABR is more robust at the low bitrates, though I don't think that has been properly tested with the LAME 3.97 and 3.98b4 versions.

I could try a VBR setting, but first I would need to find out which VBR mono setting (if any) would produce an average bitrate of 64 kbps when a large number of various speech files is encoded.

Also, a predictable bitrate is often a requirement when speech files are distributed.
Alex B
QUOTE(robert @ Aug 11 2007, 23:58) *
If the usage of more short blocks is a problem or not, you can rule out by comparing it with 3.98b4 -V[7,6,5] -mm --lowpass 13.5.

I converted the file using -V5 -mm --lowpass 13.5 (-V6 was too low, it produced only 56 kbps).

Encspot reports this:

Long blocks: 86.3% - Short blocks: 13.7%
CODE
Bitrates:
----------------------------------------------------
32 | 1.4%
40 0.2%
48 ||||||||||||||| 16.5%
56 |||||||||||||||||||||||||||||||||||||||| 42.9%
64 ||||||||||||||| 16.2%
80 |||||| 7.1%
96 |||||| 7.4%
112 ||| 4.1%
128 || 3.2%
160 0.9%
----------------------------------------------------

Type : mpeg 1 layer III
Bitrate : 65
Mode : mono
Frequency : 32000 Hz
Frames : 851
ID3v2 Size : 0
First Frame Pos : 0
Length : 00:00:30
Max. Reservoir : 511
Av. Reservoir : 27
Emphasis : none
Scalefac : not used
Bad Last Frame : no
Encoder : Lame 3.98 (beta)

Lame Header:

Quality : 50
Version String : Lame 3.98 (beta)
Tag Revision : 0
VBR Method : vbr-mtrh
Lowpass Filter : 13500
Psycho-acoustic Model : nspsytune
Safe Joint Stereo : no
nogap (continued) : no
nogap (continuation) : no
ATH Type : 4
ABR Bitrate : 32
Noise Shaping : 1
Stereo Mode : Mono
Unwise Settings Used : no
Input Frequency : 44.1kHz

The bitrate distribution is quite similar with the ABR file.
I may try to test it tomorrow.

Edit

I meant to say that the short block amount is similar with the ABR files. It's late here... smile.gif
woudy
QUOTE(Alex B @ Aug 11 2007, 15:58) *
Also, a predictable bitrate is often a requirement when speech files are distributed.
In our case this audio is intended for streaming or download, so ~64kbps is fine. However we do need 44.1 (or 22.05) khz output, due to the well-known "chipmunk" bug in the Adobe Flash embedded mp3 player we use.

robert
QUOTE(Alex B @ Aug 11 2007, 23:26) *
QUOTE(robert @ Aug 11 2007, 23:58) *
If the usage of more short blocks is a problem or not, you can rule out by comparing it with 3.98b4 -V[7,6,5] -mm --lowpass 13.5.
I converted the file using -V5 -mm --lowpass 13.5 (-V6 was too low, it produced only 56 kbps).
...
I may try to test it tomorrow.

Did you find some time to test the VBR encoded file?
woudy
QUOTE(Alex B @ Aug 11 2007, 04:59) *
Gabriel, I think this sample could be useful for the LAME development.

Thank you Alex. I have gotten permission from the speaker ("jg") to include his speech in your test corpus.

Eric Woudenberg

This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.