Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: genreal purpose audio codec? (Read 17263 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

genreal purpose audio codec?

Hi all,

Is there any general purpose codec that would work well on all kinds of audio data? For example - if you were recording your daily activities, then your data would consist of speech segments, environmental audio segments (from your surroundings) and so on. Is there an appropriate codec to use for such kind of data?

The closest to this I found was Siren 22 by Polycom. However, it's closed-source. 

Thanks in advance for answering the question.

genreal purpose audio codec?

Reply #1
I’m inclined to say almost any Codec will do.
Basically it is about dynamic range and frequency.
16 bit will give you 96 dB and 24 even 144 dB (probably more than your gear can cope with) dynamic range.
44.kHz sample rate allows for a frequency range up to almost  22 kHz.
Standard Redbook audio (16 bits/44.1 kHz sample rate) fulfill our needs pretty well.
You might use any codec like WAV, FLAC etc. to cover this.

I do think what you want (probably recording with realistic results ) is not really related to a  specific Codec.

TheWellTemperedComputer.com

genreal purpose audio codec?

Reply #2
Thanks for the reply.

I would like to explain the scenario however. Suppose, as I mentioned earlier, you are constantly recording your surroundings 24/7. WAVs and FLACs won't work here because of their enormous storage requirements (due to their lossless nature, ofcourse). In such a case, Ogg/Vorbis or MP3 can do pretty well and provide bitrates close to 64kbps (more than enough). However, these codecs are primarily targeted towards coding music and not general audio. They will work well on speech and environmental audio, but they cannot offer high compression rates that are possible if I use Speex or something similar. So, given the nature of your audio recordings (speech and surroundings), I can make more savings on storage if I can intelligently decide on the audio source and apply the appropriate compression techniques.

So my question, is there any codec that does take into account the nature of audio and then decide what technique to use?

genreal purpose audio codec?

Reply #3
Are you carrying around a PC all day to do this, or an embedded recorder?  Because unless you have a laptop with a 12 hour battery, many of the formats you mention probably aren't an option anyway.  You'd run out of battery life long before you ran out of storage. 

If you're just rigging a portable voice recorder,  I would just use MP3 or  wavpack lossy.  They're a lightweight enough that you could reasonably encode them all day without draining the battery of a typical hacked up recorder/mp3 player, and compress quite well (about 0.5 to 1.5GB per day). 

(That said, in theory WMA can do this by switching between WMA Voice and WMA Pro, but not much supports it and of course you would have to purchase software to do this from MS. )

genreal purpose audio codec?

Reply #4
The idea is to build a recorder that would last up to 24 hours and require less storage (part of my research work). So, I was thinking of writing a switchable encoder between Vorbis and Speex and put them in the same Ogg container. Before I undertake this huge task, I just wanted to make sure that there's no other codec that can do this.

I am a bit apprehensive about MP3 because of all the patent-related issues and also that at lower bitrates (40-64 kbps), Vorbis tends to perfom better. LossyWav won't give me the compression ratios that I require.

Thanks for the reply

genreal purpose audio codec?

Reply #5
The idea is to build a recorder that would last up to 24 hours and require less storage (part of my research work).


24 hours is not very much data.  Why exactly do you need compression at all?

LossyWav won't give me the compression ratios that I require.


LossyWav isn't the same thing as Wavpack lossy.  I was suggesting the latter since its very light on battery to encode, but it sounds like you're using a PC anyway, so that doesn't really matter.

genreal purpose audio codec?

Reply #6
Sorry for leaving out details - I'm planning for a small and wearable device powered by a couple of coin cells. So, the stress is on ultra low-power hardware design. Larger flash memories to store data, more the power consumed (1 GB flash memory works within the power budget). Hence, the need for compression.
Also, if I keep the Vorbis or MP3 encoder on for compressing all the data, it is more than necessary, since most of my data is going to be speech or silence and I can then use Speex (lower complexity and higher compression ratio). I still want to use Vorbis/MP3 for compressing the environmental sounds. So, can't rule out them out fully either.

genreal purpose audio codec?

Reply #7
WavPack sounds like a good idea. I think I will look into it further.


genreal purpose audio codec?

Reply #9
24 hour uncompressed mono audio is approx. 8GB. Flash in this range is really cheap. If you are going to use just a few cell-batteries, I wouldn't expect you to come far with any kind of lossy encoding.

Regarding choice of codec if you decide that anyway, I would choose LAME MP3 any day for any kind of content. It has been optimized so well that it encodes almost everything you throw at it without artifacts. Look at it this way; the music it is supposed to encode transparently is the hardest job for it. It should be no problem at all, for it to encode environmental audio.
Can't wait for a HD-AAC encoder :P

genreal purpose audio codec?

Reply #10
I'll second the recommendation for LAME mp3, as it has universal hardware/software support.  For the purposes you describe it'd probably make the most sense to use the Voice "preset" command line recommended in the LAME wiki., which forces mono and utilizes ABR:

Code: [Select]
--abr 56 -mm


From my experiences this works quite well for audiobooks (haven't done any field recording though).  You can even reduce the ABR bitrate an LAME will reduce the sample rate as needed.

You might also want to consider trying VBR (-V 9) with forced mono and perhaps a forced sample rate.  While I've never tried this, in theory VBR will handle long passages of (near-)silence better (i.e., using fewer bits and saving them for the more complex stuff).

HTH

genreal purpose audio codec?

Reply #11
Sorry for leaving out details - I'm planning for a small and wearable device powered by a couple of coin cells. So, the stress is on ultra low-power hardware design. Larger flash memories to store data, more the power consumed (1 GB flash memory works within the power budget).


Thats not really how it works.  You'll use a little bit of power writing out data to flash, but its tiny compared to the power to actually compress something.  If this thing really needs to run off a couple coin cells for 24 hours, things like mp3, vorbis, etc are out of the question.  Look into wavpack, but even then you're probably going to miss your power budget by a lot. 

PCM is almost certainly your best bet.  Buffer a  couple seconds of it to DRAM, clock up your flash, burst write your buffer, and then clock down the flash chip.

Also, if I keep the Vorbis or MP3 encoder on for compressing all the data, it is more than necessary, since most of my data is going to be speech or silence and I can then use Speex (lower complexity and higher compression ratio). I still want to use Vorbis/MP3 for compressing the environmental sounds. So, can't rule out them out fully either.


Generally anything running off a couple batteries for more then a few hours has no FPU, and thus you won't be encoding vorbis unless you're writing your own integer vorbis encoder.  Have you figured out which codecs are even possible to run on your hardware?  And how big of a battery pack you'll need to do it?  IMO its not really worthwhile asking about all these codecs if you can't actually run them.

genreal purpose audio codec?

Reply #12
It does take a non-trivial amount of power and time to write flash. Whether you'll have net power savings by compressing depends on the complexity of the compression and resultant bit rate. Power-wise, you may be better off with a low complexity and/or low bit rate encoder. Of course, you may find the sound quality unacceptable. Have a look at CELT and Speex.

genreal purpose audio codec?

Reply #13
You might consider just using run-length and Huffman coding, which would cut down the size of the raw data without using much processor power.

Also, choose your bit depth and sampling rate accordingly, the bit depth being probably the easiest to trim back.

genreal purpose audio codec?

Reply #14
Maybe SBC is an option. This is Boothooth's primary general purpose audio codec. It's quite simple/fast and gives you a quality per bit ratio that is probably similar to MPEG Layer 1. For example, you could use a sampling rate of 32 kHz and a data rate of about 128 kbps (for one channel). You'll find an SBC implementation in the libbluez source code.

genreal purpose audio codec?

Reply #15
Maybe SBC is an option. This is Boothooth's primary general purpose audio codec. It's quite simple/fast and gives you a quality per bit ratio that is probably similar to MPEG Layer 1. For example, you could use a sampling rate of 32 kHz and a data rate of about 128 kbps (for one channel). You'll find an SBC implementation in the libbluez source code.


I was thinking something like 10 bit ADPCM @ 22khz.  Thats only 215kbps, and the compression is much more power efficient then doing a subband decomposition.

genreal purpose audio codec?

Reply #16
Do you expect to do any AGC to compensate for the large variations in loudness typically encountered?

-k

genreal purpose audio codec?

Reply #17
Thank you everyone for all the suggestions. I have definitely got a few things to think about before I start.

On the other hand, just out of curiosity, would it actually make any sense to write a new codec (not exactly new, taking the important properties of the good codecs out there and integrating them) that can handle multiple sources of audio with ease, be power-efficient and portable-device friendly (all of these at low bitrates, 4-8 kbps for speech, 40-64 kbps for the rest)? Just a thought.

genreal purpose audio codec?

Reply #18
Thank you everyone for all the suggestions. I have definitely got a few things to think about before I start.


Out of curiosity which CPU were you going to use?

On the other hand, just out of curiosity, would it actually make any sense to write a new codec (not exactly new, taking the important properties of the good codecs out there and integrating them) that can handle multiple sources of audio with ease, be power-efficient and portable-device friendly (all of these at low bitrates, 4-8 kbps for speech, 40-64 kbps for the rest)? Just a thought.


MS did that with the WMA9 family, so I guess it made sense to them.  I've never actually seen someone use it though, so I think in practice it hasn't been too popular.  I think part of it is that voice codecs tend to be under completely different restrictions then audio codecs in most situations, so its difficult to combine the two without giving up too much (in terms of latency, packet size, cpu power, memory, etc). 

genreal purpose audio codec?

Reply #19
The plan is not to use a CPU, but develop an ASIC for better power efficiency. And yes, since there is no FPU, I have started writing an integer vorbis encoder (from your earlier post).

genreal purpose audio codec?

Reply #20
Thanks for the reply.

I would like to explain the scenario however. Suppose, as I mentioned earlier, you are constantly recording your surroundings 24/7. WAVs and FLACs won't work here because of their enormous storage requirements (due to their lossless nature, ofcourse). In such a case, Ogg/Vorbis or MP3 can do pretty well and provide bitrates close to 64kbps (more than enough). However, these codecs are primarily targeted towards coding music and not general audio. They will work well on speech and environmental audio, but they cannot offer high compression rates that are possible if I use Speex or something similar. So, given the nature of your audio recordings (speech and surroundings), I can make more savings on storage if I can intelligently decide on the audio source and apply the appropriate compression techniques.

So my question, is there any codec that does take into account the nature of audio and then decide what technique to use?


This is precisely what the following upcoming audio coding standard will be for. We presently call it "Unified speech and audio coder", but that name will probably change. At high bit rates (32 kbps per channel and more), it's quite similar to HE-AAC, so for now I recommend using that.

www.gel.usherbrooke.ca/gournay/documents/publications/AES126_...pdf

Chris
If I don't reply to your reply, it means I agree with you.

genreal purpose audio codec?

Reply #21
The plan is not to use a CPU, but develop an ASIC for better power efficiency.


A couple points:

1)  Generally to have an ASIC fabricated you need to order thousands of units.  Are you planning to order that many devices?
2)  Theres a lot of commercially available ASICs that can do what you need without the enormous cost of fabricating a custom part.
3)  Your ASIC will probably be based on some kind of CPU or DSP internally unless you're really going to try and layout the logic directly to encode a file, which I think would be staggeringly difficult.  Have you thought about which kind you will use?

And yes, since there is no FPU, I have started writing an integer vorbis encoder (from your earlier post).


I think this is probably not going to be worthwhile because of the power requirements.  You should probably pick a format thats well suited to what you want to do, and I don't think thats going to be Vorbis, or any perceptual codec for that matter.  Thats probably going to be some kind of PCM variant so that you can keep the processing on your device to an absolute minimum.

genreal purpose audio codec?

Reply #22
In case anyone is interested in technical details (and has the time and financial resources):

At next week's IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011) in Prague, I and a colleague of mine will be presenting some of the recent work that was accepted for integration into the "unified speech and audio codec" I mentioned above, namely:

Stereo coding at bit rates of ~96 kb/s and higher:
AASP-P10.4: EFFICIENT TRANSFORM CODING OF TWO-CHANNEL AUDIO SIGNALS BY MEANS OF COMPLEX-VALUED STEREO PREDICTION

Design of the arithmetic coder replacing Huffman coding:
AASP-P10.3: EFFICIENT CONTEXT ADAPTIVE ENTROPY CODING FOR REAL-TIME APPLICATIONS

See www.cmsworldwide.com/ICASSP2011/Papers/PublicSessionIndex3.asp?Sessionid=1034 for details. One detail which is probably interesting for some forum members: to demonstrate the advantage that the new stereo coding tool has over traditional tools, we conducted a formal blind test including two items from HA: BerlinDrug and Waiting.

Chris
If I don't reply to your reply, it means I agree with you.

genreal purpose audio codec?

Reply #23
Chris,

It's surprise that USAC will have improved efficiency at  96 kbps and higher. Until now all signs were indicating that it will be another low bitrate codec and LC-AAC will still be used for >80 kbps.
Looking at list of new coding techniques there are high chances that  USAC will have substantial improvement of coding efficiency over LC-AAC.

genreal purpose audio codec?

Reply #24
The plan is not to use a CPU, but develop an ASIC for better power efficiency.


A couple points:

1)  Generally to have an ASIC fabricated you need to order thousands of units.  Are you planning to order that many devices?


I've worked on a few designs where the volume was in the hundred range.  In those cases the chips were going into a small quantity of very expensive products like satellites or CAT scanners.

You can also have chips made through MOSIS which  combines multiple designs/customers on one mask set so you aren't fighting the economics of making whole wafers in one design.  The smallest standard quantity I saw listed is 40.