Help - Search - Members - Calendar
Full Version: New High-Performance Audio Codec
Hydrogenaudio Forums > Misc. > Off-Topic
NumLOCK
I've just finished a prototype of a new audio codec, based on neural networks.

Basically, the codec doesn't estimate how much distorsion can be tolerated on each audio sample. Instead, it estimates how much that distorsion will be remembered by the listener, after a short amount of time.

One strange property of this, is that small sections (a few tenths of a second) are typically easily ABXable from the original !

A longer section, starting from ~ 4 seconds, cannot be ABXed anymore. In other words, the precise impression caused by the section of sound, is very accurately reproduced. This is (psycho^2)acoustics rather than psychoacoustics laugh.gif

In practice, this approach allows for very deep cuts of precision, and very rough quantization even in the mid-range of music - all this, without any perceptible loss !

The codec is completely hybrid: 256 subbands using DWT (discrete wavelet decomposition) for near-static signals, and pure time-domain processing (not even subbanding !) for near-perfect transient handling.

The bit-reduction is huge, even during transients - thanks to the vector rotated-wavelet coding.

As of today, I reach transparent (or very near from that - I only made tests on ~ 75 samples) quality at around 43kbps average (for 48kHz, 16-bit, stereo). A 5.1 channel encoding brings that to ~58kbps with perceptual channel coupling enabled.

The astonishing thing, is that entropy coding is disabled for now. With this enabled, we can expect stereo near-cd quality, perceptually approaching a 85dB SNR at GSM bitrates.

For the moment, no lowpasses were used. I don't think this would bring much of an advantage though.

Edit: after the 1st moment of euphory, a bit more clarity was added wink.gif
JohnV
Ok... Thank god this is just one time in a year.
Mac
Surely it would be quicker to say you found hardware support for MPC? tongue.gif
NumLOCK
QUOTE(JohnV @ Apr 1 2003 - 10:23 AM)
Ok... Thank god this is just one time in a year.

One time in a year, what are you talking about.. for such a breakthrough, one time in history is enough laugh.gif

QUOTE
Surely it would be quicker to say you found hardware support for MPC?

Just wait a year or two, and many portable players will run a real OS. (just think Zaurus..) Then you need software support only wink.gif
KikeG
Damn! I almost believed it. Although it sounded pretty strange, the style of writing made it believable.. Here in Spain it should have been at 28th december, not today wink.gif
rjamorim
QUOTE(JohnV @ Apr 1 2003 - 06:23 AM)
Ok... Thank god this is just one time in a year.

I so agree...
NumLOCK
QUOTE(rjamorim @ Apr 1 2003 - 03:45 PM)
QUOTE(JohnV @ Apr 1 2003 - 06:23 AM)
Ok... Thank god this is just one time in a year.

I so agree...

AggGGGgggggggrrrrr...

QUOTE
Damn! I almost believed it. Although it sounded pretty strange, the style of writing made it believable..

Thanks ! I tried to enumerate lots of blatant impossibilities, while staying as serious as possible laugh.gif

Would like to see such a codec though.
Bedeox
This might be theoretically possible... (if we hacked our brain, that is)
But the algorithm would be VERY complex, very slow, very very cpu intensive.
A mainframe or two would suffice to encode one file a day. Maybe.

<edit>
Great fragment about wavelet encoding for static signals, time-domain for 'perfect' transients.
This would surely beat worst lame settings.
</edit>

<edit>
You forgot to add a link about this new technology...
</edit>

<edit>
Yeah, the impression might be reproduced, but sound.... naaaah. wink.gif
</edit>
Oge_user
[1st April]
With WMA I reach CD Quality at 64kbps.
[1st April/]
de Mon
QUOTE(Bedeox @ Apr 1 2003 - 08:38 AM)
This might be theoretically possible... (if we hacked our brain, that is)
But the algorithm would be VERY complex, very slow, very very cpu intensive.


I don't think so.
1. Human brain isn't so difficult to 'hack' as it spoken.
2. We already reached needed CPU power. CPU usage depends on software realization. I had Arcanoid game on Atari 800 (16Kb) and have now Arcanoid (about 16 Mb) and the last haven't preferences over the 1st. Growing frequenciy of CPU makes programmers lazy.

Sorry for offtopic.
Bedeox
If you take CURRENT CPUs into account, that would be VERY complex
(look at speech to text conversion, this requires much processing power and is still very primitive,
although alorithms are quite good... check LAME on 386SX cpu, that would be similar)
Gecko
So in the very moment I listen to the music it sounds horrible, but I forget that very soon (after ca 4 seconds). So every 4 seconds I go "yuck" again? I am sure that even with Petri Net Simulation you couldn't erase those subconcious effects from my brain. So 4 seconds after the music stops, my concious thought will say: my, that sounded just like it should, while my subconcience is a total wreck. You should put up a disclaimer: Not for the mentally weak!
de Mon
QUOTE(Bedeox @ Apr 1 2003 - 01:38 PM)
If you take CURRENT CPUs into account, that would be VERY complex
(look at speech to text conversion, this requires much processing power and is still very primitive,
although alorithms are quite good... check LAME on 386SX cpu, that would be similar)

Yse, single CPU based computers can't handle such task today.
What about text to speech try theese:

http://www.voiceware.co.kr/english/demo/demo_text.html
http://www.rhetorical.com

About brain hacking. I think it already become possible. But the project is to be very huge, so the main problem is project coordination. Especialy between medics, biologysts=programmers. Good biologyst or medic who knows brain good, spend all his time or even life to understand it and can't know something about C++++++, or assembler.
Bedeox
Sorry, text to speech is nowhere as computationally hard as speech to text.
It requires special setup, a learning period, lots of cpu power and doesn't always give good results.
Not to mention it has to be modified for every language.
Good text to speech can be hard to distinguish from human!
(if you have WinXP, check M$ Sam)

For example http://www.scansoft.com/naturallyspeaking/ provides such a product (one of the best...)

(Hacking a brain was partly a joke biggrin.gif - 'hacking' also means programming)

<edit>
Written fast on a slow keyboard, bleh.
</edit>
de Mon
QUOTE(Bedeox @ Apr 2 2003 - 07:28 AM)
Sorry, text to speech is nowhere as computationally hard as speech to text.
It requires special setup, a learning period, lots of cpu power and doesn't always give good results.
Not to mention it has to be modified for every language.
Good text to speech can be hard to distinguish from human!
(if you have WinXP, check M$ Sam)

For example http://www.scansoft.com/naturallyspeaking/ provides such a product (one of the best...)

(Hacking a brain was partly a joke biggrin.gif - 'hacking' also means programming)

<edit>
Written fast on a slow keyboard, bleh.
</edit>

It was also fast reading, so I misunderstood you text>speech , speech>text.Sorry. biggrin.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.