IETF Opus codec now ready for testing

Topic: IETF Opus codec now ready for testing (Read 406728 times) previous topic - next topic

0 Members and 3 Guests are viewing this topic.

IETF Opus codec now ready for testing

2011-02-04 16:36:30

We'd like to announce that the Opus codec is now ready for testing. The bit-stream is now is a "pseudo-freeze", which means that unless a problem is found during testing/review, there are no longer any changes planned. The only exception to this are the SILK-mode FEC and the stereo SILK mode, which should be landing in the next few days. Considering that these are not critical features, we felt like the testing phase could already begin.

You can get the source code in two different ways. There is a release tarball at http://jmvalin.ca/opus/opus-0.9.0.tar.gz . You can also extract the source directly from the I-D ( http://www.ietf.org/id/draft-ietf-codec-opus-02.txt ) with the following command:

cat draft-ietf-codec-opus-02.txt | grep '^\ \ \ ###' |
sed 's/\s\s\s###//' | base64 -d > opus_source.tar.gz

Now the Opus codec supports three mores, one of which is identical to CELT 0.11, which was just released (not announced officially yet).

It would be nice if the HA community could help test this codec. Opus targets a very wide range of bit-rates, from 6 kb/s narrowband speech up to 510 kb/s stereo music. Perhaps a 64-96 kb/s stereo music test would be interesting to do. Anyone would like to help?

IETF Opus codec now ready for testing

Reply #1 – 2011-05-05 11:33:53

Did some testing....
I noticed one particulary bad sample....

Code: [Select]

foo_abx 1.3.4 report
foobar2000 v1.1.7 beta 1
2011/05/05 20:12:50

File A: E:\music\modules\buzz_rtltune.XM
File B: C:\Users\mud\Desktop\buzz - RTL.tune001  .ogg

20:12:50 : Test started.
20:13:18 : 00/01  100.0%
20:13:31 : 01/02  75.0%
20:14:32 : 02/03  50.0%
20:14:57 : 03/04  31.3%
20:15:47 : 04/05  18.8%
20:16:06 : 05/06  10.9%
20:17:49 : 06/07  6.3%
20:18:00 : 07/08  3.5%
20:18:07 : 08/09  2.0%
20:18:20 : 09/10  1.1%
20:18:27 : 10/11  0.6%
20:19:02 : 11/12  0.3%
20:19:09 : 12/13  0.2%
20:19:17 : 13/14  0.1%
20:19:25 : 14/15  0.0%
20:19:33 : 15/16  0.0%
20:19:46 : 16/17  0.0%
20:20:04 : 17/18  0.0%
20:20:12 : 18/19  0.0%
20:20:20 : 19/20  0.0%
20:20:27 : 20/21  0.0%
20:20:30 : Test finished.

 ---------- 
Total: 20/21 (0.0%)

Bitrate is 96kbps.

Note that the OGG file is indeed encoded with CELT, and a packet decoder was used in FB2K for native CELT playback and ABXing. Can't give a link to the packet decoder yet as its still pre-alpha and its Peter's work, too.

Since the XM file is public, I don't see the harm in posting a complete sample and the original XM file.
Original file: http://mudlord.emuxhaven.net/buzz_rtltune.XM
CELT: http://mudlord.emuxhaven.net/buzz_rtltune.ogg

On the cymbals, there is a noticable volume difference, not sure how to describe it, but it seems sure as hell as a artifact.
Used the most recent foo_dumb component for the original XM.

IETF Opus codec now ready for testing

Reply #2 – 2011-05-07 15:01:10

Peter fixed this, now cant ABX the samples.
Issue with decoder end.

IETF Opus codec now ready for testing

Reply #3 – 2011-06-19 00:46:48

Sorry for looking like a tool for asking this, but with support for up to 500kb/s bitrate for music, does that mean that CELT/Opus is also going to be positioned into regular audio-to-file encoding like MP3 and AAC?

IETF Opus codec now ready for testing

Reply #4 – 2011-07-27 20:04:08

Quote from: Tom Servo on 2011-06-19 00:46:48

Sorry for looking like a tool for asking this, but with support for up to 500kb/s bitrate for music, does that mean that CELT/Opus is also going to be positioned into regular audio-to-file encoding like MP3 and AAC?

The encoder (and tools) will need to mature some, of course, but yes: It's suitable for stored file usage too. That wasn't an original goal, but we kept pushing the quality envelope to the point where this is a realistic application of the codec.

IETF Opus codec now ready for testing

Reply #5 – 2011-09-04 23:13:28

Oh, since your here, did jmvalin add the resampler yet?

IETF Opus codec now ready for testing

Reply #6 – 2011-09-04 23:44:19

subscribe?

IETF Opus codec now ready for testing

Reply #7 – 2011-09-05 12:58:21

ah yes, just noticed that, opusenc includes the resampler.
Yay.

IETF Opus codec now ready for testing

Reply #8 – 2011-10-13 16:04:55

I'll go ahead and bump this thread as well, while we're talking about it in a different thread.

Have we heard any more news from anyone on this front? Hopefully we're nearing bitstream freeze, assuming most of the issues have been fleshed out and dealt with.

IETF Opus codec now ready for testing

Reply #9 – 2011-11-26 14:05:33

The codec's featureset sounds as if it could be a promising candidate for a new free standard for lossy audio to follow the beloved Vorbis codecs. Therefore I wonder if it is in principle possible to use it for multichannel audio (e.g. 5.1 channel movie audio).?. Can there be an encoder for multichannel stuff in the future?

IETF Opus codec now ready for testing

Reply #10 – 2011-11-27 03:57:27

I'm hoping that Opus will be used more widely as a VoIP codec (since low latency necessarily means compromising coding efficiency anyway), and that xiph's new brainchild Ghost will be what replaces Vorbis for music archival. So I'd be more okay with Ghost getting the multichannel support over Opus, so long as Ghost comes along, that is.

I'm very tempted to try to learn more about coding just to try to lend my own hand to the process in whatever way possible.

IETF Opus codec now ready for testing

Reply #11 – 2011-11-27 15:20:50

I was under the impression that low latency usually (if not always) means higher quality. just an impression I got.

a couple days ago, I compiled opusenc/opusdec, and apparently they plan multichannel for Opus. I thought this because of the --bitrate option being "6-256 per-channel". more on target, commit 6dd8086d in users/greg/opus-tools.git says "First cut at working multichannel support".

IETF Opus codec now ready for testing

Reply #12 – 2011-11-28 02:11:43

Quote from: FreaqyFrequency on 2011-11-27 03:57:27

I'm hoping that Opus will be used more widely as a VoIP codec (since low latency necessarily means compromising coding efficiency anyway), and that xiph's new brainchild Ghost will be what replaces Vorbis for music archival. So I'd be more okay with Ghost getting the multichannel support over Opus, so long as Ghost comes along, that is.

In fact, Opus is one of the outcomes of the Ghost project - next to Monty's plans that'll maybe stay in a state that may not even be called proper vaporware.
Opus is what we've got, and it's damn good. The compromised quality is speculation - look at the facts: The prototypes clearly beat HE-AAC, less complexity and more efficiency than Vorbis, competitive on a much broader bitrate range, official recommendation as an internet standard, probably it'll also be endorsed by the ITU - and on top of all that the super-low latency, specialised speech mode built-in - what else could I wish for? Also, you can kind of "turn off" the low latency: Have you noticed that you can turn up the block size to almost 60 ms?..

Quote from: greensdrive on 2011-11-27 15:20:50

a couple days ago, I compiled opusenc/opusdec, and apparently they plan multichannel for Opus. I thought this because of the --bitrate option being "6-256 per-channel". more on target, commit 6dd8086d in users/greg/opus-tools.git says "First cut at working multichannel support".

Given the VoIP background, where monaural audio is still predominant, could it be that "multichannel" means no more than stereo? I've never heard them speak of something beyond...

IETF Opus codec now ready for testing

Reply #13 – 2011-11-28 03:46:17

Quote from: Speckmade on 2011-11-28 02:11:43

could it be that "multichannel" means no more than stereo? I've never heard them speak of something beyond...

no. I have since deleted my compiles, but looking at the source code (at line 127 of opusenc.c in Gregory Maxwell's opus-tools.git):

Code: [Select]

--downmix-stereo   Downmix to to stereo (if >2 channels)

note that it's "greater than two channels"... I didn't test this, though.

also (line 116 and 117 of the same file):

Code: [Select]

--speech           Optimize for speech
--music            Optimize for music

so I assume that xiph.org is not making this for spoken-only environments. in other words, it may have a speech background, but they are creating space for music intentions as well.

IETF Opus codec now ready for testing

Reply #14 – 2011-12-01 00:55:37

Quote from: greensdrive on 2011-11-28 03:46:17

Quote from: Speckmade on 2011-11-28 02:11:43
could it be that "multichannel" means no more than stereo?

no.

Oh, nice, thanks. - I'm happy to hear that! :-)

IETF Opus codec now ready for testing

Reply #15 – 2011-12-01 03:08:59

Quote from: Speckmade on 2011-11-28 02:11:43

Opus is what we've got, and it's damn good. The compromised quality is speculation - look at the facts: The prototypes clearly beat HE-AAC, less complexity and more efficiency than Vorbis, competitive on a much broader bitrate range, official recommendation as an internet standard, probably it'll also be endorsed by the ITU - and on top of all that the super-low latency, specialised speech mode built-in - what else could I wish for? Also, you can kind of "turn off" the low latency: Have you noticed that you can turn up the block size to almost 60 ms?..

I don't know so much about less complexity. In the MDCT modes our encoder is quite low complexity compared to typical codecs for lossy music (e.g. Vorbis / AAC / HE-AAC), yes. For communications purposes you need both an encoder and a decoder, so we've traded a bit there— the decoder is more complex though it needs much less memory, so in practice its easier to implement the decoder on many small devices than Vorbis. (And especially les than the "worst case" vorbis decoder, which needs something like 32mbytes of ram).

Don't hold your breath on an ITU endorsement. The ITU both officially, and via their participants has opposed the process, though perhaps there will be some kind of reversal after the fact. The politics have been extensive and have substantially delayed the finalization of the codec (but thats okay, we put the time to good use squashing bug in the silk stuff)... and perhaps some day I'll get to write a political intrigue novel about all the crazy nonsense that has gone on with people trying their best to block/slow the process.

Turning up the size to 60ms doesn't really matter much except at very low bitrates: It only saves a few bytes from some better prediction and eliminating a bit of overhead.
(and, in fact, there is a bug in the currently released encoder that prevents high bitrates for 40/60ms frames, though the fix which is in my tree should be merged within a few days)

We couldn't have 'real' large frames without doubling the worst case memory usage— and just adding them wouldn't actually help much because of all the other design decisions centered around low latency/small frame sizes. At one point I switched to 2k frames and wasn't able to get it to sound good in a couple hours of twiddling. (And, in fact, at very high rates I've seen some evidence that our 10ms frames may be generally better than the 20ms ones for many signals. Future encoders will likely do smart things with automatic frame size selection)

In any case, the only samples that we seem to really suffer from the small transform size is are highly tonal frames with irregular tonality (e.g. harpsichord). These are rare enough that for unconstrained VBR users (the same people who wouldn't mind a codec with 100ms delay, I assume) we can just detect and boost the rate for these frames. Jean-marc has a branch that does this. The results are quite impressive. I expect the hydrogen audio folks to be quite pleased.

Quote

Given the VoIP background, where monaural audio is still predominant, could it be that "multichannel" means no more than stereo? I've never heard them speak of something beyond...

As noted, this is actual surround support. Opus will never be an awesome low-rate surround codec: our coupling model is too limited— and the limits are fundamental. But it's well defined for the sake of being One Codec To Rule Them All. Then again, it seems all the rage in surround coding is this parametric stuff and, personally, I think all the parametric stereo/surround I've heard sounds like crap. I'd prefer mono audio to motion sickness kthnx.

The opus-tools in my repository is fairly raw at this point but I'd like to get it up to initial release grade soon. I'm trying to balance time between working on that and on the codec. It would be helpful to me for some people to try it out and give me feedback. (I know opusdec gets the output channel order wrong, before anyone reports that — I haven't finished the multichannel work there yet).

There is also ffmpeg and gstreamer support in the works. The container support should be final now, but I'm not prepared to call it final until I've done interoperability testing with multiple implementations. If anyone is working on support in applications please join #opus on freenode. (There is a web client on the opus-codec webpage, of course everyone else is welcome too).

IETF Opus codec now ready for testing

Reply #16 – 2011-12-01 14:44:44

Speaking about complexity the things are changing quickly. iPad can play LCAAC during 140 hours or HEAAC v2 during approx. 60 hours. Eventuallty display consumes much more so anyway a user will run out of battery much more faster.

During last 2 years there was a big break trough for energy efficiency. It's not surprise that embedded and mobile devices can handle decoding of OptimFrog slowest mode (HD video, etc).

IETF Opus codec now ready for testing

Reply #17 – 2012-01-15 10:45:35

Will the CELT encoder accept 24 bit wav files? I am for fun trying out how different lossy encoders manages 24 bits audio. Regards.

IETF Opus codec now ready for testing

Reply #18 – 2012-01-20 04:48:56

I saw the IETF draft, and, as expected, the final word on the specification will be reference implementation itself. I suppose that it means that, like VP8 and others, any bug or quirky platform dependent behavior in the reference implementation will become the standard itself. This is specially worrying because SILK was closed source for a long time.

So, I would like to know if the IEFT specification was tested. That is: if someone has tried to write at least a decoder based on it. I haven't found any other opus implementation besides the reference one. You guys maybe should talk with ffmpeg/libav people, that will probably write their own opus implementation sooner or latter anyway, to do that. They may even give some useful advice on the codec specification.

And is there any plans for an "Opus-HD" (high delay) in the future?

Or the low-delay design is too fundamental to the codec design, and some additional bigger window overlaps and frame sizes will be too difficult to implement/not make enough difference? I guess that with the economy in signaling bits, there is no bits left to extend the codec... right?

Anyway, Opus is looking great as it is. I hope to see it everywhere soon!

IETF Opus codec now ready for testing

Reply #19 – 2012-01-23 22:49:43

Quote from: Caroliano on 2012-01-20 04:48:56

So, I would like to know if the IEFT specification was tested. That is: if someone has tried to write at least a decoder based on it.

Tim Terriberry has written an alternate implementation. As I understand it, this is based on reading the reference code, and then trying to implement the same ideas in a different way. The CELT side works; the SILK side is close, but doesn't yet pass all the compliance tests.

So there are two independent implementations, even if one of them is actually the spec.

IETF Opus codec now ready for testing

Reply #20 – 2012-01-23 23:28:55

Quote from: Caroliano on 2012-01-20 04:48:56

I saw the IETF draft, and, as expected, the final word on the specification will be reference implementation itself. I suppose that it means that, like VP8 and others, any bug or quirky platform dependent behavior in the reference implementation will become the standard itself. This is specially worrying because SILK was closed source for a long time.

The silk part of the codebase has seen a year and a half of open source development along with substantial rewriting and many boneheaded mistakes being fixed by the original developers and the folks from the CELT development team as well as other participants. The silk part of bitstream isn't even remotely compatible with the original due to the fixes, redesigns, etc.

Quirky platform dependent behavior was minimized by concurrent development on many platforms, and by simply producing a completely portable implementation. (Though, admittedly the silk code is influenced by the fast operators on ARM— though I don't think anyone would find much reason to complain about that— and its no accident).

Quote

So, I would like to know if the IEFT specification was tested. That is: if someone has tried to write at least a decoder based on it. I haven't found any other opus implementation besides the reference one. You guys maybe should talk with ffmpeg/libav people, that will probably write their own opus implementation sooner or latter anyway, to do that. They may even give some useful advice on the codec specification.

Tim has written an almost complete no-code-sharing reimplementation of the format (http://people.xiph.org/~tterribe/celt/opus...03-float.tar.gz), which in many places was designed to be maximally different in approach from the reference in order to validate that the implementation flexibility which we believed was there was actually there.

The MDCT (formally CELT modes) have been complete in it for a long time— almost a year— the LP/hybrid parts of it were only recently written and still have some cases incomplete. The reimplementation discovered a number of erroneous behaviors of the sort you're concerned about, and they've been corrected.

We asked the libav folks in the past and didn't get much interest— moreover, after working with their libvorbis implementation to fix some non-conformance a year or so ago, I think that if Opus was done in the same style it wouldn't provide a lot of spec validating reimplementation value, since a lot of the code was clearly copied from libvorbis and then just modified to meet libav conventions. (I don't say this to begrudge their efforts, they did a good job speeding up some parts— and the fact that it line for line duplicates libvorbis made it much easier to track down some cases where their type conversions caused overflows)

The reference implementation of Opus is also itself a bit more like one and a quarter implementations— because it includes both fixed and floating point (though with a healthy amount of code sharing).

The presentation from IETF 82 covers many of the things we did for testing, specifically with these concerns in mind http://www.ietf.org/proceedings/82/slides/codec-4.pdf

A point here is that the style of "the spec is not code" also means "the spec is not executable" which also means "the spec itself can never be completely tested", in those cases all you can do is test some implementations of the spec and hope that bugs in the spec carried through rather than being identically fixed by accident by the implementers. If it were to turn out that the implementations and spec did not agree, what would you do? "Fix" the implementations thus breaking the interoperability which is the whole point of a standard? or "Fix" the spec? which would really imply that the implementations were really the real specification all along. I'd argue that the value is in having multiple implementations, not in having the description of what the code should be doing in a form so precisely that someone could implement it cold but not precise or formal enough for it to be actually be executable itself. And we have evidence from multiple implementations— though it would be nice to have had more of it.

(In some hypothetical ideal world of unbounded resources, I'd love to have had a team writing a formal specification in Coq from which an executable version could be mechanically extracted and compared to a practical implementation— a solution which would address many concerns, but sadly on the list of priorities that kind of resource investment falls below many other things, below improving perceptual quality, below patent clearance auditing, below QA on the code that millions of people will actually be running and with which interop will be essential no matter what the spec says, etc)

Quote

And is there any plans for an "Opus-HD" (high delay) in the future?

Or the low-delay design is too fundamental to the codec design, and some additional bigger window overlaps and frame sizes will be too difficult to implement/not make enough difference? I guess that with the economy in signaling bits, there is no bits left to extend the codec... right?

No, no plans for high delay. There is plenty of room to improve things via a smarter encoder while remaining compatible. I don't see any value in incompatible extensions— if we're going to be incompatible we can do much better by throwing things out and without the low delay assumptions we'd do some things quite differently.

IETF Opus codec now ready for testing

Reply #21 – 2012-03-16 13:47:29

Quote from: NullC on 2012-01-23 23:28:55

The presentation from IETF 82 covers many of the things we did for testing, specifically with these concerns in mind http://www.ietf.org/proceedings/82/slides/codec-4.pdf

As someone who is occasionally depressed by the half-assery that often passes for software development, that's an inspiring document. Someday (hopefully soon) all standards work will be done like this.

IETF Opus codec now ready for testing

Reply #22 – 2012-03-27 14:14:44

CELT/OPUS has been designed before 2/06/12 and I'm curious does it have any advantages over iSAC?

IETF Opus codec now ready for testing

Reply #23 – 2012-04-10 00:29:19

Quote from: .alexander. on 2012-03-27 14:14:44

CELT/OPUS has been designed before 2/06/12 and I'm curious does it have any advantages over iSAC?

I'm not aware of any good listening tests — just going from the design I expect opus to outperform iSAC at iSAC's bitrate's, but a comparison would be interesting to see. The ex-gips folks from Google spoke pretty highly of Opus to me, though perhaps they were just being polite.

Beyond that, Opus scales to (near-) perceptual transparency while iSAC doesn't even support a high enough sampling rate for transparency. Opus supports much lower latencies (down to 5ms, vs ~33ms or so). Opus was developed with a public open and transparent process, while iSAC is a formerly proprietary codec which appears to have royalty free licensing which is limited to webrtc use. For what iSAC does they may be fairly close, but Opus is just a lot more versatile.

IETF Opus codec now ready for testing

Reply #24 – 2012-05-04 14:42:27

Can we have some updates about OPUS? there isn't much info on the opus-code.org page. In fact, it seems pretty stagnant. Maybe on a technology readiness level or something?

The last thing I've seen was the presentation about OPUS. I was looking through the IETF website (some links seem dead from the tread post), but even there seems to be close to no changes.

The only sign of changes and therefore activity, was the change date of this document: http://datatracker.ietf.org/doc/draft-ietf-codec-opus/
On the history table ( http://datatracker.ietf.org/doc/draft-ietf...c-opus/history/ ) it seems, the codec is somewhat in an idle state, waiting for approval from other departments.

The reason why I'm asking about updates, is because there is very little discussions going on about using OPUS being used as archiving codec. I know the codec is designed to be an interactive codec, for VoIP and the likes, but the presentation that I've been watching, suggested that it is also very capable of creating transparent results at lower bitrates than Vorbis is even AAC. So, does OPUS make sense as archiving codec for music collections? I haven't found any documents discussing this.

Notice