QUOTE(towolf @ Nov 24 2007, 10:08)

QUOTE(rootkit @ Nov 23 2007, 01:51)

Optimization falgs will give you 1% speedup improvements and less compatibility.
Just an AMD64 build will help, since it has double the registers if i386.
But if nero uses the coding technologies reference code from 3gpp.org then it's NOT amd64 aware so..
What I meant was that the GNU/Linux build is a lot slower than the Windows build, and it was suggested by the devs that it wasn't compiled in the optimal way (without MMX/SSE or whatever). Also Nero must be quite far from the reference code, mustn't it?
The GNU/Linux binary doesn't have any SSE instructions in it. (objdump -d linux/neroAacEng | grep xmm) (The SSE registers are called xmm0-7 (or 0-15 on AMD64). grep comes up empty, so no instructions touch the sse registers. or the %mm MMX registers, either.)
Building for AMD64 has lots of advantages: SSE and SSE2 are standard, so you don't need to have a fallback in case they're not present. The extra registers are a big help for floating point number crunching. The ABI is more efficient (function calls pass parameters in registers, instead of on the stack), so less cache traffic. Backwards compat with 386 (and even 8086) has led to some serious cruft and what are now poor design choices for modern CPUs (e.g. stack-based FPU). For stuff that can use SSE2, having it always available is very nice.
On my Core 2 Duo (E6600), wine neroAacEnc.exe is the fastest. The GNU/Linux binary is about half speed (not counting startup time if a bunch of wine stuff has to load from disk- I'm doing movie soundtrack encodes from an 8.6GB float32 wav file written by dcadec. The neroAacEng_SSE.exe binary runs two threads (on my dual core cpu), but maybe wine makes it really inefficient. When my system is otherwise mostly idle, the _SSE binary runs in maybe 10% less time than the regular windows binary, but uses twice as much CPU time. If anything is keeping one core busy, the _SSE binary under wine runs _much_ slower. Maybe as slow as 1cpu second per audio second. Still using all the cpu time it can get on both cores. So obviously I always use the non-SSE binary under wine. (I use wine 0.9.46 on AMD64 Ubuntu Gutsy)
Err, do the nero devs read these threads? There's no bug-reporting URL or information anywhere in the stuff that comes with the zip file, or that I can find on the nero site.
I guess the asm is written as inline asm using MSVC syntax or something? If there are a few key asm routines that would help a lot, you could speed up the GNU/Linux version by moving them to a separate .asm source file that you compile with nasm or yasm. That has the advantage of using Intel assembler syntax instead of AT&T, so you wouldn't have to rewrite things, just call them differently. (Sorry if you already know that. And wine works so well that if the windoze binary under wine is still even faster, I'd probably keep using it.)
Anyway, I logged on to the forum to report a couple bugs I've found in neroAacEnc Package build date: Aug 6 2007, version 1.1.34.2. I run the win32 binaries under wine 0.9.46 on AMD64 Ubuntu Gutsy.
1. 2pass mode from large input files doesn't use the requested bitrate. Instead it uses a very small bitrate that sounds horrible.
I have an 8.6GB (9230966852B) .wav file made with dcadec on a movie soundtrack which is 8013.06 seconds long.
The RIFF header has the file size modulo 2^32, i.e. ~611MB. 1pass encoding, e.g.
wine .../neroAacEnc.exe -br 398860 -if my.dcadec.wav -of my.dcadec.mp4
produces a nice sounding mp4 of the first 556.5s. And with -ignorelength, it produces a good AAC for the whole thing. So no real surprises so far.
With
CODE
wine neroAacEnc.exe -2pass -ignorelength -br 398860 -if my.dcadec.wav -of my.dcadec.ignlen.2pass.m4a
I get an AAC file with the right length in seconds, but the wrong length in bytes! It's only 91302706B (88MB), ~89.6kbit/s, and sounds horrible. Much much worse than a 2pass encode at 90kb/s of the first 9 min (done by omitting -ignorelength). The encoder's terminal output shows it running through all 8012 seconds on the first and second pass, though, and at normal speed. (about 29 minutes for both passes total on a 2.4GHz C2Duo using a single thread.) Sorry I forgot to copy&paste from my terminal window before I did stuff that scrolled the output off the screen. By the time I got this all typed, I probably could have run the encode again, but I didn't start it half an hour ago, and I'm almost done typing.
BTW, I'm using 2pass because I already encoded the video, so I know how much space is left for audio to hit just under 4482MB. If I'm feeling ambitious, maybe I'll get my encoding script to see how big the audio is after a VBR encode, and set the target video bitrate for the second pass based on that...
My input file is 6 channel 32bit float. It's RIFF "fmt " header uses WAVE_FORMAT_EXTENSIBLE. (see
http://www-mmsp.ece.mcgill.ca/Documents/Au.../WAVE/WAVE.html in case my terminology doesn't match what you're used to.)
CODE
00000000 52 49 46 46 3c 60 35 26 57 41 56 45 66 6d 74 20 |RIFF<`5&WAVEfmt |
00000010 28 00 00 00 fe ff 06 00 80 bb 00 00 00 94 11 00 |(...............|
00000020 18 00 20 00 16 00 20 00 3f 00 00 00 03 00 00 00 |.. ... .?.......|
00000030 00 00 10 00 80 00 00 aa 00 38 9b 71 64 61 74 61 |.........8.qdata|
00000040 00 60 35 26 00 00 00 00 00 00 00 00 00 00 00 00 |.`5&............|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
(I hope this is useful. I considered using xxd to hexdump in a format that xxd could turn back into a binary, but I thought the traditional hexdump format would be simplest. I don't expect the bug to be hard to reproduce.)
There are no more RIFF chunks later in the file, AFAICT. I searched for "RIFF", "WAVE", and "data", and neither of those strings appears in the file except at the beginning. So it's just a wav file with a wrapped length in the header.
I get the same results with wine neroAacEnc_SSE.exe (although the file is slightly smaller, 89831446B). I can't test with the GNU/Linux binary because of the next bug I found:
2. the GNU/Linux version doesn't support large files. strace output:
...
CODE
write(2, "\n", 1) = 1
brk(0) = 0x826e000
brk(0x828f000) = 0x828f000
open("uw.dcadec.wav", O_RDONLY) = -1 EFBIG (File too large)
fstat64(0x1, 0xffcfd36c) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0xffcfd330) = 0xfffffffff7f75000
write(1, "ERROR: could not open WAV file\n", 31) = 31
exit_group(1) = ?
You can enable large file support by putting this before any system header files are included:
CODE
#define _FILE_OFFSET_BITS 64
If there's a header file that all your source files include before anything else, the top of it is a good place to put feature-test macro definition like that.
I usually #define _GNU_SOURCE in my own projects, as that enables all the goodies, including LFS. If you don't want to get sucked into using GNU extensions (maybe you're still planning an OS X port...), then the _f_o_b define will just replace open with open64, etc., and everything will Just Work. (open() will be called with O_LARGEFILE.) Just make sure you use off_t instead of int any time you need to store a file offset or size, since it becomes a 64bit type.
Thanks very much for making a free high quality HE-AAC encoder available. I'm very impressed.
edit: I forgot to mention a feature request:
3. 2pass encoding from stdin would work if you save the stats to file, like video codecs (e.g. x264). If you don't want to expose the inner workings of your encoder to prying eyes that much more easily, then I understand (and lament that non-Free software always has to have mis-features like this). It would be really nice not to have 8.6GB temporary files... (OTOH I'll probably usually use -q, not -2pass -br).
Hope this helps, and happy hacking.