So, you're stuck with a lot of lossy files and no lossless original because you didn't rip the files yourself. That is a different situation to many of us here, where we don't have to worry about transcoding because we can go back to the lossless source and apply DSP before encoding to MP3.
You also want loud playback on your DAP which can barely provide enough volume in your IEMs for your tastes, so even using Rockbox (supported on some other Cowon models, which provides ReplayGain tag support) wouldn't work. A headphone pre-amp would work, but you don't have one.
You say peak normalisation works for you better than ReplayGain. You probably accept uneven loudness to get the overall loudness higher and beat the background noise. Replaygain certainly works better than peak normalisation for creating even perceived loudness from track to track or album to album without having to reach for the volume control too often, but if the environment is noisy enough you might fail to perceive the loudness being even or you may need to raise the average levels (dynamic compression is the only option left) for some passages in the music.
If you're lacking in volume with your DAP I would really suggest subtle dynamic range compression (DRC), which is much less brutal than modern mastering practices. The foo_dsp_vlevel DSP at a low strength setting might be good.
Clipping of about 1-3 samples duration (at 44.1kHz) is probably going to be inaudible in the vast majority of cases, which is why I suggest using RG with 3-5 dB of preamp without clipping prevention. Encoder testing is done without clipping prevention and many LAME -V2 and Ogg Vorbis -q5 decode with clipping (because peak value is often about 1.2 x full-scale for most modern source CDs) with no loss of transparency. Otherwise these cases would be deemed "problem samples".
I don't know of any audible problem samples caused by clipping of the decoded MP3. They all seem to relate to characteristics of the original CD audio such as short transients and not to the fact that clipping occurs when decoding and restricting to 16-bit. Only excessive additional gain provided by setting an mp3gain target (or fb2k preamp) very high would cause audible problems.
To achieve the DAP loudness you desire, you have to accept some compromise. Allowing SOME clipping would probably have little to no detrimental effect on the music, which is why I suggested it in option 3. It also doesn't require a transcode if your source is MP3 or AAC-LC (common .mp4 or .m4a encodings) because mp3gain, aacgain or foobar2000 can apply global gain field adjustments to increase the loudness without transcoding. (Vorbis works differently, so vorbisgain is only tag-based RG which the player must support, or you must encode from a RG-adjusted source, such as with foobar's converter)
I'd say the options in my previous post are in order of decreasing sound quality / increasing alteration from the source audio, but even numbers 5 and 6 can give an enjoyable audio experience that's better than struggling to hear the sound and better than modern CD mastering practice if you have a suitably dynamic source. Even transcoding can allow a pleasant experience. I'd suggest a double filename extension to indicate the original source format and that it's transcoded.
Transcoding in rough order of best to worst: - based on others' experiences and investigation, not my own ABX tests.
NO TIME-SPREADING, NO FURTHER LOSS:1.
Transcode to lossless, e.g. FLAC or ALAC as supported by your DAP. No worse than listening directly to foobar output, but bitrate is about
700kbps or more (less if you use modest ReplayGain targets than if you use high volume).
NO TIME-SPREADING, only added noise via near-lossless encoding2.
Use lossyWAV/lossyFLAC/lossyWV. Transcode using lossyWAV followed by a compatible lossless compressor like FLAC (if your Cowon D2 supports FLAC). Probably audibly indistinguishable at --standard setting or above (typ
~460 kbps). In noisy environments (e.g. DAP on public transport), the --portable setting (~
380 kbps) or even settings as agressive as -q 0 (~
290 kbps) may be indistinguishable from uncompressed, though Wavpack hybrid (see below) might be better, if it's supported. Should avoid conventional transcoding artifacts at a bitrate close to high-bitrate conventional lossy.
3.
Transcode to Wavpack hybrid. Supported by only a few DAPs but all with Rockbox, and doesn't measure the noise floor like lossyWav --standard. Probably hides the added noise better than lossyWav --portable or -q 0 so could be better at same bitrate. Can get pretty good quality as low as about
250 kbps.
TIME SPREADING and other transcoding artifacts/unmasking possible4.
Transcode to different conventional lossy format at high bitrate. Choose another supported lossy format with a good high-quality setting, e.g. VBR quality setting that normally gives transparency or near-transparency. Even going from MP3 source, many MP3-only players can actually handle MP2 files, sometimes only if renamed as .MP3, so it might even be an option for the most basic DAPs and digital radios with MP3 support, though MP2 is poor below 192 kbps and has inefficient dual stereo mode for quality with no safe joint stereo mode.
5.
Transcode to different lossy format at moderate bitrate. If bitrate is quite important to you, choose a lower setting from a good encoder, e.g. LAME -V5 or AAC 128 kbps.
6.
Transcode to the same lossy format at high bitrate. Using the same format seems to cause problems (e.g. mp3 to mp3 and aac to aac are both worse than mp3 to aac or aac to mp3), but sometimes only one format is acceptable for compatibility reasons. Padding with a few milliseconds of silence can mitigate some but not all of the adverse effects, apparently. Encoding at extreme settings such as LAME -V1 or -V0 (beyond the standard just-transparent setting) can help reduce the potential unmasking artifacts a little at the expense of higher bitrate (e.g. 200 kbps or more).
7.
Transcode to same lossy format at 'standard' setting. Using LAME -V2 for example, or OggVorbis -q5 is considered transparent normally, but can be unmasked by double-encoding.
8.
Transcode to same lossy format at low bitrate. Using LAME -V5 for example (~130 kbps) you might get fairly good results or some nasty transcoding artifacts from time to time.
My first recommendation:Don't get too hung up on clipping-prevention-at-all-costs. With modest RG pre-amps (or mp3gain set to as high as 92 or 94 dB the clipping isn't likely to be audible, especially during the dramatic peaks where it's most likely to occur.
My second recommendation:Don't be too afraid of dynamic compression. A little subtle DRC can improve the listening experience in noisy environments of with weedy DAP volume without seriously degrading the overall emotional journey in the music or robbing it of its punch, kick or sparkle. Check to see if your DAP incorporates any DRC options.
My third recommendation:Don't be too afraid of transcoding. It may still be the a part of the most practical way of maximising listening pleasure within the constraints of certain situations.
My final recommendation:Don't be afraid of strong dynamic range compression. In difficult acoustic environments where volume must be restricted or background noise is high, or you must prevent hearing damage, it can genuinely provide the best sound quality possible in the circumstances. Even coupled with transcoding, it might be the best solution possible in such circumstances and provide a pleasant sound without causing listener fatigue.
If you're really stuck with a DAP that's too quiet for you, and listening in a high-background-noise environment, I'd imagine very dynamic tracks (like Queen's well-known Bohemian Rhapsodie) would end up lost in the noise during the quiet parts. Then, I'd truly suggest foo_dsp_vlevel is worth a try (on a subtle setting) followed by the least harmful transcoding setting that you can manage (both from point-of-view of format support on your DAP and from bitrate that you can accept, given the quantity of files that you have to transcode versus those you can leave alone, which might allow 700 kbps for a small number of transcoded files, while the rest (e.g. modern compressed recordings) remain at 128 kbps, say, in their original format) with, for example mp3gain and a little clipping permitted.
Ideally, you'd quarantine the files created for your DAP in a folder called, for example "C:\Cowon D2 Music - clipping, DRC & transcoding possible" so you don't play them on your PC when you have the superior originals instead. You could also use informative file names or double file name extensions, such as:
01 - RockorPopTitle - Artist1.RG94.mp3
09 - DynamicTitle2 - Artist2.ogg.vlevel10%.transcode.mp3
04 - DynamicTitle12 - Artist3.mp3.vlevel10%.lossy.flac
Perhaps you'd create the second of those two then apply mp3gain to it to ensure the volume is matched to the other files you own to which you didn't apply vlevel.
With the third, to match 94 dB ReplayGain, you might have first applied vlevel while outputting to lossless, then recalculated ReplayGain and applied it when converting to lossyFLAC by way of lossywav with stdin input, its output piped to FLAC, after which you'd delete the intermediate lossless FLAC which had only had vlevel but not RG applied.
An optimal approach would probably be:
Step 1.
Keep originals safe. Copy all files to the DAP folder on your PC.
Step 2.
Apply a high ReplayGain like 94 dB without clipping prevention to the mp3 and aac (.m4a) files themselves using foobar2000, mp3gain or aacgain. (OK, for the Cowon D2, I believe that AAC isn't compatible, so you'd be forced to decode to FLAC or APE or transcode to lossyFLAC, mp3 or ogg vorbis)
Step 3.
Apply ReplayGain while converting any lossless originals you have to the destination lossy format (mp3 or ogg vorbis would make sense for the Cowon D2, as might lossyFLAC if FLAC's simple decoding is found to extend battery life).
Step 4.
If you have Ogg Vorbis source tracks/albums that show a ReplayGain of -3 to -7 dB, they're definitely close enough to leave alone given that the D2 doesn't support Vorbisgain (I'll take your word for this). In fact, if aiming for 94 dB SPL, I'd be tempted to leave alone vorbis albums with Album Gain values of -2 to -8 dB. Otherwise, if you need to adjust the volume of Vorbis sourced files, you'll be forced to convert to a large lossless file (FLAC/APE) while applying RG in fb2k (which will sound as good as the vorbis), or convert to a smaller lossyFLAC or transcode to mp3 (e.g. LAME -V1 or -V2 should sound pretty good if FLAC or lossyFLAC isn't acceptable).
Step 5. If you find any files that generally seem loud enough but where clipping is audible and annoying at the gain you require (hopefully you won't, though), try going back to the source file then in fb2k's converter apply Replaygain and try applying Advanced Limiter to soften the clipping distortion, then convert losslessly (FLAC/APE), or to lossyFLAC or transcode to a different lossy format.
Step 6. If you find files or albums that are too dynamic for the listening environment required, go back to the source files for those and use fb2k's converter with foo_dsp_vlevel at a subtle setting (Configure selected DSP) and convert to lossless. Then scan that lossless output track/album for ReplayGain. Then convert that lossless file with RG applied into the desired output format (lossless FLAC/APE, lossyFLAC or transcode to mp3 or ogg - whichever is different from the original source format).
Hopefully you'll get away with only Step 1 and Step 2, and perhaps Step 3 and won't need to adjust the volume of any Ogg Vorbis source files. That way you avoid transcoding and any form of audible dynamic range compression. The other steps, if necessary at all, will probably only be needed for a small proportion of your music unless your environment is so noisy that it's worth using vlevel on all your music, so the extra effort in reprocessing those files won't be too arduous.
But really, don't worry if you need to apply DRC for noisy environments. The music can remain enjoyable even when slightly compressed and transcoded, and will surely be far more enjoyable than music you can't hear below the volume of the background noise!
That should pretty much cover all bases.
One more thing from
this review of the D2 is that it seems that if you can restore the factory settings and select USA instead of a European country you might be given a greater volume output to help overcome the limitations of your IEMs in combination with the D2. (possibly thanks to an impedance mismatch)
QUOTE
The output power of the D2 is the most powerful of any Cowon device to date. 37 mW of power per channel (at 16 Ohm), giving you a total of 74 mW, are enough to power pretty much any pair of headphones. As far as I am aware, if you select any of the European countries on starting your player the maximum output is severely limited, but I always select USA so I can give my eardrums a maximum pounding through my Grados.