Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLAC I/O less efficient than STDIN (Read 9557 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

FLAC I/O less efficient than STDIN

I was optimizing caudec when I came across this oddity. Basically, letting /usr/bin/flac access .flac files on a slowish HDD directly for decoding ('flac -d file.flac') was in one particular case almost twice slower than piping the files to /usr/bin/flac via STDIN ('cat file.flac | flac -d -').

I used a double album for testing, made of 37 tracks for a total of about 1 GiB, located on a HDD that tops out at about 70 MB/s. Incidentally, flac decodes on my machine at a similar rate.

I ran caudec twice (figuratively - I repeated the tests many times) with 8 concurrent processes, for decoding those FLAC files to WAV on a ramdisk. I made sure to drop all caches between each run. First run was with direct file access, and completed in 40 seconds. Second run was with piping to STDIN, and completed in 25 seconds.

The difference was much less pronounced, surprisingly, on a USB flash drive that tops out at 35 MB/s, 34 seconds vs. 30 seconds, and non-existant on a RAID 1 array that tops out at 130 MB/s and on a SSD that tops out at 500 MB/s. I experienced similar differences with WavPack.

Does anyone have any idea of what's going on?

FLAC I/O less efficient than STDIN

Reply #1
Running the reading and decoding processes in parallel?

FLAC I/O less efficient than STDIN

Reply #2
I would imagine reading and writing to the same mechanical disk would be the culprit. This is supported by the USB drive measurement but that doesn't appear to explain everything. If I am understanding correctly the plain version of decoding is slower than a workaround-type STDIN decoding. Although I am unfamiliar with *nix there are two possibilities (either or both):

OS - handling of STDIO regarding write-buffering via HDD driver and/or CPU;
BIN - when outputting the data from a STDIN source it reads/writes larger chunks and the chunk size works better with the write-buffering and/or CPU cache (I recall something a FB2K regarding something about differences in seek-table when the encoder used STDIN, not sure why this would also affect decoding).
"Something bothering you, Mister Spock?"


FLAC I/O less efficient than STDIN

Reply #4
Ok, I see that in your third paragraph. (Pardon my oversight, I posted right before a medical appointment so apparently I was distracted.)

If I read correctly:
FLAC -d [HDD -> RAMdisk] = 40s
FLAC STDIO [HDD -> RAMdisk] = 25s
FLAC -d [flashdrive -> RAMdisk] = 34s
FLAC STDIO [flashdrive -> RAMdisk] = 30s
RAID = no tangible difference
SSD = no tangible difference

You mentioned WavPack having similar behavior so it doesn't seem the binary is the culprit either. I don't suppose using less concurrent threads would improve the performance of FLAC -d but it might be worth checking. It also is unclear how these threads are distributed but I wondered if multiple decode threads caused an unintended bottleneck (especially with fast-decoding formats).

edit: I should also have mentioned I thought an instance of STDIO was limited to one thread per file, but this may be a bad assumption on my part.
"Something bothering you, Mister Spock?"

FLAC I/O less efficient than STDIN

Reply #5
If I read correctly:


Yes.

I don't suppose using less concurrent threads would improve the performance of FLAC -d but it might be worth checking.


Using 4 processes instead of 8: HDD direct: 45 seconds, HDD STDIN: 27 seconds; USB direct: 34 seconds, USB STDIN: 31 seconds. Note that I'm using a quad-core CPU with hyperthreading (4 cores, 8 threads).

I should also have mentioned I thought an instance of STDIO was limited to one thread per file, but this may be a bad assumption on my part.


Yes. Really, the only difference here is that I delegated the reading to /usr/bin/cat. That alone magically improves performance, particularly in the HDD case. /usr/bin/cat is doing something right, that /usr/bin/flac is doing wrong, or so it seems anyway.

FLAC I/O less efficient than STDIN

Reply #6
try using just one thread for cat and the same for flac -d.  perhaps one binary is optimized for multi-core, and one is not.

seems to me that flac would use ram to decode.  the less ram (because of the ramdisk) may be limiting the decode ability of flac.  but I think that maybe the same could be said for cat.  that's assuming you used actual ram and not swap or other temporary hard drive space for the ramdisk.

also... /usr/bin/flac seems like a binary provided by your distribution.  maybe try using a more optimized one that you compiled, or even one from rarewares (if they have it) since caudec supports wine anyway.

FLAC I/O less efficient than STDIN

Reply #7
I tested this on my machine with mostly insignificant differences.

On a 171MB FLAC -8 encoded file.
I ran one decode to allow Linux to cache the FLAC file in RAM and then discarded the results.

I did 3 runs...

A proper process efficient redirection to standard input:
flac -o test.wav -d - < 01-A\ Change\ Of\ Seasons.flac

real    0m3.740s
user    0m3.432s
sys    0m0.300s


A process inefficient pipe from cat:
cat 01-A\ Change\ Of\ Seasons.flac | flac -o test.wav -d -

real    0m3.869s
user    0m3.428s
sys    0m0.720s


Allowing FLAC to read the file itself:
flac -o test.wav -d 01-A\ Change\ Of\ Seasons.flac

real    0m3.765s
user    0m3.392s
sys    0m0.336s

I ran 3 runs of each test and while the numbers fluctuated slightly, the time spread remained similar on all runs.

In the given example run:
The difference between the best run (process efficient redirect) and the worst (pipe from cat) is 129ms
The difference between the process efficient redirect and directly reading is only 25ms.

Given this admittedly abysmally inadequate sample size it would appear that shell STDIN redirection provides the fastest decode, but the difference between redirection and directly reading the file is small enough to basically dismiss as noise. It would appear that in all cases the context switches involved with invoking cat yield the slowest results by a significant margin.

FLAC I/O less efficient than STDIN

Reply #8
I tested this on my machine with mostly insignificant differences.


For obvious reasons:

On a 171MB FLAC -8 encoded file.


You used a single file (my experiments use many, concurrently) that amounts to a rather small amount of data (I used a total of 1 GiB in order to make the differences more pronounced)!

I ran one decode to allow Linux to cache the FLAC file in RAM and then discarded the results.


You let your OS cache your file on purpose, so it was decoded from RAM. Why? I'm talking about hard drive access. Do you even understand what I'm talking about?

A proper process efficient redirection to standard input:
flac -o test.wav -d - < 01-A\ Change\ Of\ Seasons.flac


Actually, I tried that, and it's a lot less efficient in my experiments than piping cat's output: my test, using that method, completes in 40 seconds (versus 25) off my HDD.

Given this admittedly abysmally inadequate sample size


Your entire testing process is completely off-topic.

FLAC I/O less efficient than STDIN

Reply #9
You let your OS cache your file on purpose, so it was decoded from RAM. Why? I'm talking about hard drive access. Do you even understand what I'm talking about?


I eliminated the hard drive from the equation because you're trying to investigate an issue with many variables in play.
My goal was to narrow this down to one factor, the method of delivering data to flac, and to test to see if there is a discernible and significant performance pattern related to them.
As expected, in my limited testing, using shell builtin redirection was faster than spawning a wasted cat process and was slightly faster than letting flac read it directly.

You came here with a question about why in your script you are seeing this anomaly, and the first step in that is to break down the steps you use and test if there is an inherent inefficiency in them.
You raised the question about the different performance based on how the data was delivered to flac, and I set out to test each method to see if there was a significant performance hit for any one. You take a slight performance penalty for spawning unneeded processes, and it has an impact even on a single decode. Multiply that 125ms by several hundred decodes and it adds up. I was simply presenting a small data set test to show the performance differences between the methods you asked about.  It may or may not be the source of your problem, but it's a data point that can be considered and then confirmed or eliminated as a contributing factor.

You came here with a problem and I tried to provide a small bit of data to aid in your investigation. I'm sorry if my attempt at helping offends you.

A proper process efficient redirection to standard input:
flac -o test.wav -d - < 01-A\ Change\ Of\ Seasons.flac


Actually, I tried that, and it's a lot less efficient in my experiments than piping cat's output: my test, using that method, completes in 40 seconds (versus 25) off my HDD.



Then there is something else terribly wrong. Shell native redirection should ALWAYS be faster than piping the data from cat. There's an entire process that no longer needs to be spawned and managed (cat) for every single decode operation. 

I'm not sure why there appears to be a slight performance hit for having flac read the file directly. I'd need to dig into the sources to see where the difference lies but I would imagine there was a lot more thought put into efficient IO by the people writing bash than by the guy who wrote flac.

FLAC I/O less efficient than STDIN

Reply #10
You might try `blockdev --setra 65536 --setfra 65536 <device>` to set blockdev/fs readahead to ridiculously high values.

It's possible that the difference in performance between the HD, USB HD and RAID are primarily due to small I/O timing differences between the processes tickling the pagecache in different ways.

FLAC I/O less efficient than STDIN

Reply #11
Then there is something else terribly wrong. Shell native redirection should ALWAYS be faster than piping the data from cat. There's an entire process that no longer needs to be spawned and managed (cat) for every single decode operation.


Not true. The pipe adds an extra layer of buffering between the filesystem read and the decoding process (and one whose size is adjusted dynamically by the kernel). With a redirect, whenever flac read()s stdin for new data, the read goes right to the kernel. With a pipe, the filesystem read may have already been completed by cat.


FLAC I/O less efficient than STDIN

Reply #13
You might try `blockdev --setra 65536 --setfra 65536 <device>` to set blockdev/fs readahead to ridiculously high values.


Bingo! With those values, the test on the HDD completed in 16 seconds in all cases! All my drives were set to 256 sectors (128 KiB). I noticed that performance improved dramatically when adjusting that value a single step up (512), and 2048 (1 MiB) sounds like a rather sane value.

FLAC I/O less efficient than STDIN

Reply #14
Cool.

Note that --setra and --setra are completely different settings IIRC. Setting these values too high could compromise performance on other applications, so unless the drive is devoted to music, you should probably tune them down appropriately.

I'm rather curious as to if you can improve performance at the default readahead values by instead tuning CFQ params.

FLAC I/O less efficient than STDIN

Reply #15
I'm rather curious as to if you can improve performance at the default readahead values by instead tuning CFQ params.


Yes: 23 seconds with CFQ/readahead at 256 (vs. 40 seconds with deadline), 17 seconds with CFQ/readahead at 16384.

I completely forgot that I changed the scheduler to deadline years ago.

FLAC I/O less efficient than STDIN

Reply #16
I'm rather curious as to if you can improve performance at the default readahead values by instead tuning CFQ params.


Yes: 23 seconds with CFQ/readahead at 256 (vs. 40 seconds with deadline), 17 seconds with CFQ/readahead at 16384.

I completely forgot that I changed the scheduler to deadline years ago.


Yes, I could understand how CFQ would perform better in this sort of workload. But 1MiB readahead still seems a *tad* too high. I was imagining that tweaking things like /sys/block/*/queue/iosched/slice_idle (or other settings described in cfq-iosched.txt) could help.