Main Tech Page

© 2017 DIGIFON

From "The ISDN Studio" by Dave Immer
Audio Engineering Society 99th Convention
Oct. 8, 1995 , New York City

Audio Coding ALGORITHMS

For audio applications, the algorithm is a model - or a set of rules - by which a PCM bit-stream is analyzed and re-quantized into a reduced bit-stream. Audio coding algorithms differ in how they deal with the irrelevancy and redundancy contained in the PCM signal and fall into two basic categories: Transform and Predictive, both of which have subband variations. Transform is frequency-domain based and the Predictive is time-domain based. A frequency-domain based algorithm will employ bit reduction following known characteristics (contained in an on-board lookup table) of human hearing. This process is called perceptual coding and only psycho-acoustically "relevant" waveform information is transmitted and reconstructed at the decoder where aliasing noise gets dynamically masked within subbands of audio having the most energy at the moment. Audio frequency response for frequency-domain coding is much less bit-rate dependent (but has more coding delay) than a time-domain process. A time-domain approach will use predictive analysis based on look-up tables available to the coder, and transmit only the differences between the prediction and the actual sample and then add the redundant information back at the decoder. The audio frequency response is dependent on the bit-rate of the transmission but this method results in a very low coding delay. Both approaches work quite well and each has its advantages. A major advantage of predictive coding such as APT-X is that it is a "near-lossless" treatment, making it a good choice for production applications. MPEG layer 2 allows for "tweaks" and improvements in the coding side (such as the Musicam implementation) that are "followed" by the decoding side. This makes an MPEG layer 2 decoder much less complex (and cheaper) than the encoder and therefore a major contender for digital radio broadcasting.
The reason all this number crunching must be done in the first place, can be illustrated with a few simple formulas. Stereo audio (2 channels) which is sampled at the CD rate of 44,100 samples per second with a 16-bit resolution creates a real-time "bit-rate" of 1,411,200 bits per second. But the available bit-rate - or "digital bandwidth" - of Basic Rate ISDN service is only 128,000 bps; roughly 9% of what is needed for unreduced digital audio. So a 12:1 bit reduction process is needed to "fit" the digital audio into the available speed of a single ISDN line. With the use of two ISDN lines (256 Kbps) the bit-reduction need only be 6:1, and with 3 lines (384 Kbps) 4:1.

"CD quality" stereo audio: 2 x 44,100 x 16 = 1,411,200 bits per second (1.411 Mbps)

Basic Rate ISDN = 128,000 bits per second (128 Kbps)

128,000 ÷ 1,411,200 = .09 (9%)

With inverse multiplexing, ISDN "B" - or bearer - channels can be aggregated and synchronized in increments of 64 Kbps to create "data pipes" of any size desired. Each 64 Kbps B channel, when used for a domestic call, is billed by the phone company at roughly the same rate as a standard phone call. So a 128 Kbps connection, which requires 2 B channels, is billed for 2 phone calls. A 256 Kbps connection = 4 phone calls, etc. At higher data speeds, the digital audio bit-stream requires less reduction, resulting in either more of the original waveform being transmitted and less noise to mask at the receiving end, or, in the case of time-domain coding, improved audio bandwidth.

G.722: The first popular "hi-fi" CODECs.

As recent as 1990 the only algorithm in wide use was a predictive time-domain based one developed by AT&T for broadcast applications called G.722 (pronounced "gee dot seven twenty two"). Back when ISDN was still only a gleam in most telephone companies (and equipment manufacturers) eyes, this algorithm made possible 7.5 kHz mono audio over a single 56 Kbs channel or "Switched 56". Numerous manufacturers built G.722 CODECs and they all were able to interconnect with each other. G.722 using SW56 phone service or a single ISDN "B" channel is still in wide use today and is quite adequate for speech-only applications such as commercial voice-overs, live announce, interviews, sports feeds, news reporting and high quality audio conferences. Voice programming that has been transmitted via G.722 codecs are heard quite frequently on radio and TV.