IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
LAME MP3 encoding starts fast then slows down
Kaze3
post Apr 27 2012, 10:17
Post #1





Group: Members
Posts: 2
Joined: 1-January 12
Member No.: 96160



Hi all.

I've been converting some FLAC files to MP3 using foobar and LAME 3.99.5 (64-bit). I find that the encoding starts at ~220x with 100% CPU usage, but after a short time (varies) it drops to ~60-80x with the CPU usage dropping significantly as well. I'm wondering if I'm being limited by the hard disk speed - I've tried converting with the source and destination on the same drive, and also the source on the HDD and the destination on my SSD drive and I see the same behaviour with both. I've also tried the 32-bit LAME encoder and the one that's built-in to foobar (same behaviour).

Can anyone suggest whether something's going wrong of if I am indeed limited by my hardware (HDD or otherwise)? Thanks.
Go to the top of the page
+Quote Post
megar
post Apr 27 2012, 12:04
Post #2





Group: Members
Posts: 42
Joined: 22-May 03
From: Besancon, France
Member No.: 6749



Audio encoding is not very well suited for multi-threaded processes. Furthermore, you are decoding flac and re-encoding them in mp3 in one operation. So I guess what you described is the correct behaviour. The good news is that you can transcode more than 1 file at a time, so that all the cores of your CPU are kept busy. There is a setting in Foobar to tell to use "n" parralel conversion.
Go to the top of the page
+Quote Post
Kaze3
post Jun 12 2012, 19:42
Post #3





Group: Members
Posts: 2
Joined: 1-January 12
Member No.: 96160



Sorry I haven't replied for so long. I'm still having this issue and I came back across my own thread and realised that you've slightly misunderstood. I am actually already running four lame instances (one for each core). When I said 100% CPU usage I was referring to all of the cores being at 100%. When it slows down, all of the cores' usages down down approximately the same amount. I really don't understand why I can't constantly max out all of the cores.
Go to the top of the page
+Quote Post
Ouroboros
post Jun 12 2012, 20:05
Post #4





Group: Members
Posts: 289
Joined: 30-May 08
From: UK
Member No.: 53927



It's highly likely to be limited by disk I/O, either the actual disk, or the disk controller, or the Windows disk abstraction layer.
Go to the top of the page
+Quote Post
washu
post Jun 12 2012, 20:24
Post #5





Group: Members
Posts: 134
Joined: 16-February 03
From: Ottawa
Member No.: 5032



Try running a test conversion with both the source and destination on your SSD. If that works then you know the HD just cant keep up with 4 threads. If that still goes slow then post the make and model of your SSD. Not all SSDs are fast and you may simply have a slow one.

Go to the top of the page
+Quote Post
probedb
post Jun 12 2012, 21:56
Post #6





Group: Members
Posts: 1124
Joined: 6-September 04
Member No.: 16817



It's also not necessarily true that running any process is enough to max out 1 core on your machine. Just because it's not running at 100% CPU doesn't mean the encoding isn't going as fast as it can surely?
Go to the top of the page
+Quote Post
pdq
post Jun 12 2012, 22:29
Post #7





Group: Members
Posts: 3309
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



The input file is much larger than the output, so if either should be moved to the SSD it should be the wav source file.
Go to the top of the page
+Quote Post
dumdidum
post Jun 12 2012, 22:40
Post #8





Group: Members
Posts: 57
Joined: 21-January 12
From: Germany
Member No.: 96595



QUOTE (Ouroboros @ Jun 12 2012, 21:05) *
It's highly likely to be limited by disk I/O, either the actual disk, or the disk controller, or the Windows disk abstraction layer.

I'm not so sure about that. Reading and writing of audio is sequential in nature. Sure, if you run the encoding multi-threaded, there will be multiple sequential reads and writes which will slow things down. but are decoders and encoders really that fast so that disk access becomes the bottle neck?

i can say it's not the case on my desktop. i use sound converter (a small linux tool) on my quad-core i7-920 (eight logical cores due to hyper-threading) to transcode from FLAC to MP3, running eight threads. OS is on an SSD but audio data is read and written to a HHD (seagate ST1000DM003). the disk is not the bottleneck. i previously used a considerably slower HDD (WD 750GB, 7200rpm, 3 platters) and I didn't get the impression it was slowing things down, either.

QUOTE (megar @ Apr 27 2012, 13:04) *
Audio encoding is not very well suited for multi-threaded processes.

that may be true (i'm not sure) but that's irrelevant. (EDIT: irrelevant in the context of this thread. OP wants to encode multiple files. it's probably safe to assume he wants to transcode a number of files larger than the number of cores his CPU has.) you just schedule multiple (serial) jobs to run concurrently.

This post has been edited by dumdidum: Jun 12 2012, 22:55
Go to the top of the page
+Quote Post
Ouroboros
post Jun 12 2012, 23:16
Post #9





Group: Members
Posts: 289
Joined: 30-May 08
From: UK
Member No.: 53927



QUOTE (dumdidum @ Jun 12 2012, 22:40) *
I'm not so sure about that.

I am. It isn't just the encoding that's using the disk, it's everything else that the computer is doing at the same time. You also have to remember that your PC is optimised for - nothing at all. It's a general purpose machine, designed to provide an acceptable compromise under most circumstances. Not optimised for large writes or small writes, for huge concurrency or for single use, for games or for DTP, just designed to be OK most of the time.

The fact that your quad core i7 system works well for this particular task doesn't mean that all systems will. By the time you consider the motherboard, the chipset, the BIOS, the drivers, all things where there is room for individual manufacturers to make different decisions about what they think will make their machine better / faster / cheaper, then it's easy to see how one small component can become a bottleneck.

You also have to remember that anything that uses any sort of interrupt to get work done will not scale well if there are multiple processes trying to interrupt at the same time, and that the scaling isn't linear. A disk system may be able to cope with one thread writing at 1GB/s, but could easily struggle with 5 threads each trying to write at 100 MB/s.
Go to the top of the page
+Quote Post
saratoga
post Jun 12 2012, 23:26
Post #10





Group: Members
Posts: 4718
Joined: 2-September 02
Member No.: 3264



QUOTE (dumdidum @ Jun 12 2012, 17:40) *
QUOTE (Ouroboros @ Jun 12 2012, 21:05) *
It's highly likely to be limited by disk I/O, either the actual disk, or the disk controller, or the Windows disk abstraction layer.

I'm not so sure about that. Reading and writing of audio is sequential in nature. Sure, if you run the encoding multi-threaded, there will be multiple sequential reads and writes which will slow things down. but are decoders and encoders really that fast so that disk access becomes the bottle neck?


Well, since he told you how fast hes encoding, and flac is about 700kbps while probably lame is about 200kbps, you can do the math yourself:

4*220*700 kbps *1/8 bytes/bit = 77MB/s read
4*220*200 kbps *1/8 bytes/bit = 22MB/s write

So basically, its bottlenecked if you can't sustain about a 100MB/s total through put across about 8 files concurrently. Which realistically means its bottlebecked if you have a magnetic HD.
Go to the top of the page
+Quote Post
dumdidum
post Jun 12 2012, 23:37
Post #11





Group: Members
Posts: 57
Joined: 21-January 12
From: Germany
Member No.: 96595



QUOTE (Ouroboros @ Jun 13 2012, 00:16) *
QUOTE (dumdidum @ Jun 12 2012, 22:40) *
I'm not so sure about that.

I am. It isn't just the encoding that's using the disk, it's everything else that the computer is doing at the same time.

not on my desktop aorn but on a laptop. i just encoded a single wav (1.05gb) to mp3, preset insane, using lame 3.99.5 64-bit. it took 188 seconds. the resulting mp3 is 257mb. so in total, about 1.3gb needed to be read and written. that's about 7MB/s. even if you transcoded four or eight files at the same time, and even if you used a somewhat more powerful processor, that's absolutely no problem for a modern hard drive. the seagate barracuda ST1000DM003 in my desktop scores 197 MB/s sequential read speed and 196 MB/s sequential write speed in a benchmark.

i mean, the above is just a back-of-the-envelope calculation but it suggests disk access is not the bottleneck. anyways, i guess the proper way to test this would entail caching source and destination files in RAM.

edit: needless to say, less needs to be read if you use FLAC rather than WAV.

This post has been edited by dumdidum: Jun 12 2012, 23:39
Go to the top of the page
+Quote Post
Ouroboros
post Jun 13 2012, 00:02
Post #12





Group: Members
Posts: 289
Joined: 30-May 08
From: UK
Member No.: 53927



QUOTE (dumdidum @ Jun 12 2012, 23:37) *
the seagate barracuda ST1000DM003 in my desktop scores 197 MB/s sequential read speed and 196 MB/s sequential write speed in a benchmark.

Absolutely - for a single thread writing! As soon as you ask it to schedule multiple threads then the straight throughput isn't the issue, the problem becomes concurrency and queuing. Disk manufacturers spend loads of time writing the firmware on the disk to buffer, queue and optimise reads and writes to give them huge throughputs, but there is always a compromise, and if you give the disk enough data to write simultaneously then you can make it choke. It doesn't matter how fast you can run, you can't run in two directions at the same time - and that's what multiple writes can sometimes (metaphorically) ask the disk to do. smile.gif
Go to the top of the page
+Quote Post
dumdidum
post Jun 13 2012, 00:04
Post #13





Group: Members
Posts: 57
Joined: 21-January 12
From: Germany
Member No.: 96595



QUOTE (saratoga @ Jun 13 2012, 00:26) *
4*220*700 kbps *1/8 bytes/bit = 77MB/s read
4*220*200 kbps *1/8 bytes/bit = 22MB/s write

are we sure that a modern cpu does lame --preset standard at 220x sustained in the absence of any disk i/o limitations? especially if we consider running multiple concurrent jobs such that no high turbo-boost kicks in? i somewhat (EDIT: seriously) doubt that figure...

This post has been edited by dumdidum: Jun 13 2012, 00:07
Go to the top of the page
+Quote Post
saratoga
post Jun 13 2012, 00:54
Post #14





Group: Members
Posts: 4718
Joined: 2-September 02
Member No.: 3264



QUOTE (dumdidum @ Jun 12 2012, 19:04) *
QUOTE (saratoga @ Jun 13 2012, 00:26) *
4*220*700 kbps *1/8 bytes/bit = 77MB/s read
4*220*200 kbps *1/8 bytes/bit = 22MB/s write

are we sure that a modern cpu does lame --preset standard at 220x sustained in the absence of any disk i/o limitations?


If you think the OP is wrong, feel free to provide numbers disproving his claims.
Go to the top of the page
+Quote Post
washu
post Jun 13 2012, 01:05
Post #15





Group: Members
Posts: 134
Joined: 16-February 03
From: Ottawa
Member No.: 5032



QUOTE (dumdidum @ Jun 12 2012, 18:37) *
the seagate barracuda ST1000DM003 in my desktop scores 197 MB/s sequential read speed and 196 MB/s sequential write speed in a benchmark.


How fast is that drive in random I/O? A whole 1.1 MB/s. The more concurrent threads you run the closer to random performance you will get.
Go to the top of the page
+Quote Post
dumdidum
post Jun 13 2012, 01:06
Post #16





Group: Members
Posts: 57
Joined: 21-January 12
From: Germany
Member No.: 96595



QUOTE (saratoga @ Jun 13 2012, 00:26) *
4*220*700 kbps *1/8 bytes/bit = 77MB/s read
4*220*200 kbps *1/8 bytes/bit = 22MB/s write

a modern cpu does lame nowhere near at 220x speed. according to this bench, a core i7-3820 overclocked to 4.625 GHz encodes the terminator 2 soundtrack (about 50min) in 69s to -b 160 using lame 32-bit. that's a really, really fast CPU running a single thread (overclocked and with turbo boost enabled), doing about 50x. and the test is run off a Intel SSD 510 250 GB, SATA 6 Gb/s.

so the 220x the OP mentions is completely unrealistic. so the realistic figure is about one-fourth or one-fifth as large as what saratoga computed based under the assumption that 220x is doable. thus, with four threads, we require about 20MB/s disk read/write. there's no reason to believe a modern hard drive won't handle that. each i/o is sequential in nature and the queue isn't deep.
Go to the top of the page
+Quote Post
saratoga
post Jun 13 2012, 02:26
Post #17





Group: Members
Posts: 4718
Joined: 2-September 02
Member No.: 3264



QUOTE (dumdidum @ Jun 12 2012, 20:06) *
so the 220x the OP mentions is completely unrealistic. so the realistic figure is about one-fourth or one-fifth as large as what saratoga computed


So basically, you're saying that its 220x is across all 4 CPUs?
Go to the top of the page
+Quote Post
dumdidum
post Jun 13 2012, 08:42
Post #18





Group: Members
Posts: 57
Joined: 21-January 12
From: Germany
Member No.: 96595



QUOTE (saratoga @ Jun 13 2012, 03:26) *
So basically, you're saying that its 220x is across all 4 CPUs?

Ultimately, yes. Even on a pretty fast off-the-shelf desktop PC, a serial LAME encode will not achieve a speed of higher than 60x or 70x. And if you run multiple LAME encodes simultaneously (such as running four jobs on a four-core CPU), the speed of each encode will actually be a bit slower. this has to do with the turbo boost feature of modern desktop CPUs (clock speed will be lower if there's load on all cores than when there's load on only a single core).

so, yeah, what it boils down to is that, realistically, it's more like 220x across all 4 cores.

This post has been edited by dumdidum: Jun 13 2012, 08:43
Go to the top of the page
+Quote Post
skamp
post Jun 13 2012, 09:29
Post #19





Group: Developer
Posts: 1344
Joined: 4-May 04
From: France
Member No.: 13875



QUOTE (dumdidum @ Jun 12 2012, 23:40) *
are decoders and encoders really that fast so that disk access becomes the bottle neck?


The faster the CPU, the faster the codec, and the slower the HDD, the more of a bottleneck the latter will be. Encoding WAV files to FLAC at the default compression level on my machine shows a substantial difference in encoding time when caching the input files sequentially (see here).

The smaller size of FLACs compared to WAV (less reading), and the relatively slow encoding speed of LAME, mitigate the need for fast I/O, though the end result depends entirely on your hardware (and software) configuration. Caching input files sequentially is still beneficial in most cases where a (slowish) HDD is involved.

lvqcl also mentionned this related thread.

QUOTE (saratoga @ Jun 13 2012, 03:26) *
So basically, you're saying that its 220x is across all 4 CPUs?


That sounds about right: I get about 185x when encoding WAVs from a ramdisk to LAME --preset standard on my 2.2 GHz Core i7 (8 threads).


--------------------
caudec.net
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd April 2014 - 11:15