Help - Search - Members - Calendar
Full Version: Lame and SMP
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - General
sven_Bent
i was wondering if som compiler guru could make a LAME version that would support SMP/hyperthreading. or is the lame data processing structuere not fittet for that ?
dev0
Just start two lame processes at the same time to benefit from Hypterthreading/SMP.
You can even set up EAC to do so for you.
An option in current frontends should be relativly easy to implement (Number of Processes: N).
dev0
sven_Bent
lol
that woulde be the ideal solutions regarding SMP :-)
However it is hard to tell the ripper programs that it needs to compress two files at a time and not just one.
DarkAngel
i think what dev0 means is that EAC can do it, already.

Just go to EAC - > EAC Options -> Use x compressor threads.

I use that option on a 'physical' SMP system (Athlon MP), and it works out to be about 1.7x normal speed.
However, dont expect anything amazing from HT. Im some cases its slower, in others, faster. Have a try.

If you just want a single track, give gogonocoda a try, but it hasn't been updated for a while iirc. It is, however, built from the lame sources. gogonocoda
schuberth
I did a massive ripping/encoding session using EAC&LAME on a dual Celeron 533 some time ago (almost a year). The results were very good.

IIRC, I think that when running only one instance of LAME it was using more than 50% CPU, which translates that some tasks did run in parallel (paralell? I hate spelling). So no wonder that on duals you get only 1.7 times the performance.

As for the HT (I'm no expert, just speculating) I think that running two or more instances of LAME wouldn't bring some big gain in performance because HT enables only using different parts of the CPU in parallel. So performancewise (I think) it's best to use mix of different programs with HT.

BTW, you can squeeze some more juice out of your dual (multi)proc if you start '# of procs'+1 instances of your favourite number cruncher. That way you can be sure that all procs are 99% loaded. =P
Mike Giacomelli
QUOTE
However it is hard to tell the ripper programs that it needs to compress two files at a time and not just one.


EAC has been doing this for ages.
sony666
depends on how LAME is written... since it's plain ANSI C, I dont think it's prepared for Multithreading at all.
Just running two instances of it is not what I call "Hyperthreading" though smile.gif
experttech
Exactly, running two instances of a process is not Hyperthreading. The code has to be writeen specifically to make use of hyper threading.
Gabriel
Lame is single threaded, and will probably stay single threaded for a long time.
Emanuel
QUOTE (DarkAngel @ May 21 2003 - 08:44 PM)
I use that option on a 'physical' SMP system (Athlon MP), and it works out to be about 1.7x normal speed.

Will this be valid even for other encoders, such as oggenc.exe (and flac.exe)?
Garf
QUOTE (experttech @ May 22 2003 - 06:06 AM)
Exactly, running two instances of a process is not Hyperthreading. The code has to be writeen specifically to make use of hyper threading.

Nonsense, anything that can run multiple instances at the same time can make use of hyperthreading.

You don't have to use 'threads' to effectively use 'hyperthreading'.
Gabriel
Hyperthreading is simulating 2 processors, so it will help even in case of multiple processes.

But it could be more optimal if a given multithread process would use the specific hyperthreading opcodes.
sven_Bent
QUOTE (DarkAngel @ May 21 2003 - 08:44 PM)
i think what dev0 means is that EAC can do it, already.

Just go to EAC - > EAC Options -> Use x compressor threads.

I use that option on a 'physical' SMP system (Athlon MP), and it works out to be about 1.7x normal speed.
However, dont expect anything amazing from HT. Im some cases its slower, in others, faster. Have a try.

If you just want a single track, give gogonocoda a try, but it hasn't been updated for a while iirc. It is, however, built from the lame sources. gogonocoda

damn than i need to change to EAC soon.
i always end up confussed when trying to setup EAC and goes back to Eazy CD-DA extractor.

maybe i should go look in the FAQ for EAC setup.
DarkAngel
QUOTE
Will this be valid even for other encoders, such as oggenc.exe (and flac.exe)?


In my experience, starting 2 instances of a program such as lame provides an overall speed boost of 1.7 times, rather than exactly 2. This is because there are other overheads to think about, than just CPU number crunching power. So, starting one application instance causes 50% total CPU usage on an SMP system, becuase it is exeucting withing the time frame of 1 CPU (i.e. it is using 100% of 1 CPU's resources). Starting 2 instances usually uses 97 to 100%, however the first instance will slow down a fraction, due to the overhead of the 2nd instance.

Programs which are truely work sharing multithreaded are coded so that the overheads of having a 2nd thread running are kept to a minimum (3D renderers, etc). In those cases, the boost is often over 1.7 times normal speed.

So, in short, yes, that almost always holds true for all apps which you can manually 'multithread' by opening two instances.

There are other ways you can speed up threads on SMP systems (and i assume HT as well), like devoting 100% of a known CPU's time to the task and moving all other running processes to the other CPU, but that sometimes speeds up, sometimes slows down. In both cases, the difference is usually on 3 to 5% either way.
experttech
QUOTE (Garf @ May 22 2003 - 04:25 PM)
QUOTE (experttech @ May 22 2003 - 06:06 AM)
Exactly, running two instances of a process is not Hyperthreading. The code has to be writeen specifically to make use of hyper threading.

Nonsense, anything that can run multiple instances at the same time can make use of hyperthreading.

You don't have to use 'threads' to effectively use 'hyperthreading'.

I won't use words like 'nonsense' etc, but I'd suggest you read up before posting your opinions.

Only two processes running simultaneously that use 'different' areas of CPU will benefit from HT. For example, a 3d renderer that uses the FPU and another app that uses Integer arithmetic, when run together will make use of HT.

http://www.anandtech.com/cpu/showdoc.html?i=1576&p=4

For a single app, the code *must* be of a multi threading nature to make use of Hyper threading. Its all about 'thread level parallelism'.
Garf
QUOTE (Gabriel @ May 22 2003 - 01:13 PM)
But it could be more optimal if a given multithread process would use the specific hyperthreading opcodes.

You can also use them in a pure multiprocessing setup. The special opcodes are there to prevent flooding the bus on spinlocks etc. From practical experience, they do not seem make all that much difference.
Garf
QUOTE
For a single app, the code *must* be of a multi threading nature to make use of Hyper threading. Its all about 'thread level parallelism'.


Hyperthreading does *NOT* need a multithreading application to work. It only needs two or more concurrent streams of execution, no matter whether they are threads or processes, and no matter whether they are of the same or a different application.

You should not take a vulgarizing explanation on a tech site as the final reference on a technology.
SometimesWarrior
QUOTE (DarkAngel @ May 21 2003 - 12:44 PM)
I use that option on a 'physical' SMP system (Athlon MP), and it works out to be about 1.7x normal speed.
However, dont expect anything amazing from HT. Im some cases its slower, in others, faster. Have a try.

I did a quick test on my dual XP1600+ machine, running Windows XP. I encoded one song from the commandline using LAME, which took 1:46. I then started a second commandline, typed the commands to start both encoders (on the same song), and started them near-simultaneously. Both encodes took 1:48. That's about 1.98x normal speed.

I don't know how much CPU time EAC uses, but perhaps if it's ripping quickly, and you're using a fast encoder like mppenc, the hard drive will have trouble keeping up with all the reading and writing. IIRC, when running EAC and a single encoder, Task Manager reports ~55% total CPU usage, so for me at least, I might expect using the 2-encoder EAC option to give me about 1.9x normal speed.
Mike Giacomelli
QUOTE
Programs which are truely work sharing multithreaded are coded so that the overheads of having a 2nd thread running are kept to a minimum (3D renderers, etc). In those cases, the boost is often over 1.7 times normal speed.


Nope. No multithreaded app is going to beat running two LAME processes at once.
Xenno
QUOTE
Hyperthreading does *NOT* need a multithreading application to work.


True, but it probably needs a true pre-emptive multitasking OS to work...so forget about W9x.

xen-uno

edit: just think...get a dual HT processor system and you'll have 4...count em...4 processors to work with!***

***YMWV
Garf
QUOTE (Xenno @ May 22 2003 - 08:42 PM)
True, but it probably needs a true pre-emptive multitasking OS to work...so forget about W9x.

You need an operating system that can use multiple processors (e.g. Win2K or some versions of Win XP. Note that all versions of WinXP are preemptive multitasking but not all of them support multiprocessors), but that has got nothing whatsoever to do with the discussion at hand.

QUOTE
edit: just think...get a dual HT processor system and you'll have 4...count em...4 processors to work with!***


True, and for some applications this can indeed work very well.
Xenno
> but that has got nothing whatsoever to do with the discussion at hand

Sure it does...a hyperthreaded LAME compile would still need a multi-threaded OS to run on, unless HT as implemented in hardware can perform without OS intervention...but I'll bet not. So HT LAME on a W9x system may be no faster than single threaded LAME.

xen-uno
Garf
QUOTE
a hyperthreaded LAME compile


Hyperthreading has nothing to do with the software or the compiler. It's a hardware function.

QUOTE
would still need a multi-threaded OS to run on


As I stated above (but noone seems to care to read things anyway), you don't need a multithreaded OS or an OS that supports threads to take advantage of hyperthreading. An OS that supports multiprocessing and multiple processes will do.

I'll repeat, word for word:

You don't have to use 'threads' to effectively use 'hyperthreading'.

You can effectively take advantage of hyperthreading by running LAME and mppenc in parallel, even though neither program is multithreading.
Xenno
Garf > You can effectively take advantage of hyperthreading by running...

Not so...

See this

The OS has to be HT aware for any kind of simultaneous threading to occur.

"Hyper-Threading Technology requires a computer system with an IntelŪ PentiumŪ 4 processor supporting HT Technology and a Hyper-Threading Technology enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. See http://www.intel.com/info/hyperthreading/ for more information including details on which processors support HT Technology"

xen-uno
Chun-Yu
QUOTE (Xenno @ May 22 2003 - 01:42 PM)
QUOTE

Hyperthreading does *NOT* need a multithreading application to work.


True, but it probably needs a true pre-emptive multitasking OS to work...so forget about W9x.

xen-uno

edit: just think...get a dual HT processor system and you'll have 4...count em...4 processors to work with!***

***YMWV

Windows has always been a preemptive OS since 95. It's just that you need NT/2000/XP (not home) for multi-processor support.
Xenno
Chun,

Thanks for the correction...must have been thinking of WfWG 3.11.

xen-uno
chelgrian
QUOTE (Gabriel @ May 22 2003 - 10:01 AM)
Lame is single threaded, and will probably stay single threaded for a long time.

However the gogo no coda team did a complete re-implementation of a version of lame in hand crafted x86 assembler. I believe it was a project see just how much faster hand crafted assembler could be than compiled code. It was staggeringly fast, probably a factor of 10 over a normal compiled lame.

They also implemented SMP support in their version which gave a near linear speedup with number of processors.

The problem is doing this kind of stuff is very time consuming and takes lots of very detailed knowledge. The version of lame which was converted is now very old and sounds much worse than the latest version.

Question: Has anyone tried compiling lame with ICL7 and some OpenMP directives in choice places?
RyanVM
hah, that sounds like a fun summer project for a computer science major with way too much free time tongue.gif
experttech
QUOTE
It only needs two or more concurrent streams of execution, no matter whether they are threads or processes, and no matter whether they are of the same or a different application


again...provided they access 'different' CPU areas (See the diagram in the link I posted). Otherwise it would be as optimal as running it with HT turned off.

And what I meant by a multi threaded application application making use of HT is that for example, a developer could develop a game in which say the 3D rendering and integer arithmetic could be done by two threads. in this case a HT processor will definitely benefit as the 'two instructions' can be processed in parallel. Agreed its not a compiler directive, but developers have to keep this optimization in mind.
Garf
QUOTE (Xenno @ May 22 2003 - 11:15 PM)
Garf > You can effectively take advantage of hyperthreading by running...

Not so...

See this

You probably won't believe me anyway, but the information on Intels site is just dead wrong or intentionally misleading. From the looks of it, it is just about 'brading' and not for technical reasons. (E.g. their statement that only certain Linux distributions are eligible to carry the 'works with hyperthreading' label)

Win2K can effectively use hyperthreading and produce speedups of over 40% for some applications. This is irrefutable no matter what website you are going to quote - I have the test data right here.

Although WinXP is touted as being 'hyperthreading optimized', that OS still has severe problems with the system scheduler - it was clearly never designed with hyperthreading in mind. (Note that, Win2K has the same issue. You can still get speedups despite the problem). Again, this is not from some website or promotional ad, I have test data to prove it. MS has admitted the problem will only be solved in Win 2003 Server. So there's no way WinXP is really 'hyperthreading optimized' whereas Win2K is not.

QUOTE
The OS has to be HT aware for any kind of simultaneous threading to occur.


This is just plain wrong and even Intel's site doesn't claim it.
Garf
QUOTE
again...provided they access 'different' CPU areas (See the diagram in the link I posted). Otherwise it would be as optimal as running it with HT turned off.


Of course the same execution unit can't do two things at once, but 'different CPU areas' is something completely different than you seem to think it is. This isn't about one thread doing floating point and one thread integer. It can even work if both programs are running the exact same code, but have to wait for memory. One thread can be working, while the second thread is waiting for the next memory access to get it's data.

This is why running the same application twice can give a speedup, although it obviously accesses the same area of the CPU.

And that will work even if the application was never optimized for hyperthreading.

For very tight optimized code that fully loads all processor resources and that doesn't depend on RAM, it won't work. RC5-72 from distributed.net is a nice example - the HT speedup is horrible.
Xenno
OK...

Instead of saying "The OS has to be HT aware for any kind of simultaneous threading to occur" ... I should have said "The OS has to be HT aware for any kind of HT to occur". Yes, it may not be entirely correct but isn't far off the mark if Intel's info is accurate.

> You probably won't believe me anyway, but the information on Intels site is just dead wrong or intentionally misleading

Why would Intel downplay HT on other OS's? Makes no sense from a marketing standpoint.

xen-uno
Garf
QUOTE
Instead of saying "The OS has to be HT aware for any kind of simultaneous threading to occur" ... I should have said "The OS has to be HT aware for any kind of HT to occur".


It's still completely wrong. Hyperthreading works also on a non-HT aware OS, and for some applications pretty well so.

QUOTE
Why would Intel downplay HT on other OS's? Makes no sense from a marketing standpoint.


Beats me, ask Intel, I'd like to know. The problems on Win2K/WinXP are pretty serious, to the point that enabling hyperthreading can cause severe performance degradation. Maybe there are more problems, that only affect Win2K and not XP, but I have not seen any of those.

My point remains that running 2 LAME processes on a HT CPU might speed things up, even though LAME is completely single-threaded.
schuberth
QUOTE (Mike Giacomelli @ May 22 2003 - 09:17 AM)
Nope.  No multithreaded app is going to beat running two LAME processes at once.

I would like to point out that the boundary between threads and processes is very thin. On some *nixes threads are implemented as processes. For example Linux, which supports Posix threads, implements them as processes with shared memory. So on Linux using threads should not releave the task scheduler of extra management when compared to simple forking (process multiplication for the win people).

Accordingly, the quoted statement above may or may not hold true depending on how you define "multhithreaded app" or "process", which OS you're using and so on.

@Garf
If understood well, HT should enable the processor to save on the context switching time between threads by not completly clearing the pipeline between two virtual processors (or qucik restoration of the content) and some other optimizations (never really read the tech. documentation of HT so I may have misunderstood the whole thing). According to Intel you need chipset AND os support but you're saying that improvements can be gained in other oses too without direct support for HT?
Tripwire
HT-unaware OSes see simply two CPUs.
Garf
QUOTE
but you're saying that improvements can be gained in other oses too without direct support for HT?


QUOTE
HT-unaware OSes see simply two CPUs.


Yes, and this is enough to get a performance boost in some applications.
chelgrian
QUOTE
I would like to point out that the boundary between threads and processes is very thin. On some *nixes threads are implemented as processes. For example Linux, which supports Posix threads, implements them as processes with shared memory.


Linux does indeed currently do this, however the performance sucks and there is also the addtional problem that POSIX thread semantics don't exactly map onto Linux process semantics so there are slight differences between Linux Threads and POSIX Threads.

Linux 2.6 has hooks in the kernel for a much lighter weight thread mechanism. There is also support being developed in GLIBC for this threading mechanism see http://people.redhat.com/drepper/ for details. Of course Redhat being Redhat have backported this into 2.4 and GLIBC 2.3 and released it in Redhat 9 causing huge binary compatability headaches :/

Hyperthreading is Intel's take on Symmetric Multi Threading(SMT). Scheduling for SMT is almost exactly the opposite of schedualing for Symmetric Multi Processing (SMP). An SMP scheduler has several sets of physical resources which are unaffected by each other but the overhead of switching between them is large. This is why large schedulers for large scale SMP systems usually have a concept of processor affinity, ie if a process has been executing on a particular processor it is more likely to get scheduled to that processor again. An SMT scheduler has one set of physical resources but several sets of virtual resources with a low switching overhead.

In Intel's implementation a uniprocessor machine with HT switched on looks like an MPS Multiprocessor system, therefore any OS which supports SMP will boot and see two processors. An OS which knows about HT can interrogate the BIOS structures and find out if the processors are virtual or physical then adjust scheduling appropriately. Supposedly Windows XP is HT aware but as mentioned elsewhere it has severe problems with it's scheduler. Linux 2.4 also knows about HT.

If the code is already heavily optimised and is not I/O bound then HT will most likely slow things down even if the OS is HT aware since the process is more likely to get interrupted. Lame probably fits into this category. HT is really most suited for unpredictable workloads where the process may stall on I/O lots, the idea is that the processor can then execute another thread without the overhead of a full context switch by which time the original IO request will probably have fulfilled.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.