Audio formats, latency and bit rate

Audio coding for IP: Latency vs. bit rate

During the last twenty five years various coding algorithms were established in the broadcasting industry, e.g. J.41, J.57, MPEG Layer 2 and 3, as well as non- standardized algorithms such as apt-X or ADPCM4SB (Micda). Because of a variety of demands it’s not easy for an operator to choose the proper bitrate, operation mode and sample rate. It becomes more difficult with the flow of the new algorithms developed in the last years.

Besides those known formats, others, e.g. from telephony, such as G.711 and G.722 have been applied in broadcasting as well as the more modern and successful AAC variations, such as MPEG 2 and 4 AAC, HE AAC (formerly aacPlus), HE AACv2, AAC ELD*, as well as linear audio or AES/EBU transparent transmission. All those formats with modes, like mono, stereo, in all different sampling- and bit rates, some even in multichannel (5.1/7.1) technique.

The development of coding algorithm has been given a focus to optimization of such parameters as bitrate, quality – also after multiple en/decoding, latency and compatibility. Algorithms described here can be categorized in such a way. HE AACv2 was developed solely for reduction of the bitrate while reaching a very good sound quality. On the other hand, apt-X always focused on the delay parameter.

Due to the growing bandwidth available there are linear transmission methods, such as 16, 20 or 24 bit linear audio or even the transparent transmission of 3,072 MBit/s AES/EBU come today more and more to consideration. Here are quality and delay playing the main roles at the cost of bandwidth.

There is no detailed research on market share of various coding algorithms in the broadcasting world. However it is widely assumed that MPEG Layer 2 and G.722 are dominate, while such algorithms as linear audio, ADPCM, apt-X and Enhanced apt-X, as well as MPEG 4 Standard HE AAC are being increasingly used for corresponding applications.
A detailed discussion on choosing a proper method must include consideration on quality, flexibility, bandwidth, latency, compatibility, standardization, market share and expectations. In fact an optimal solution can be found for almost any system.
*work item of MPEG

 

Audio Encoding Algorithms: Quality and Delay
Audio Encoding Algorithms: Quality and Delay

 

G.711

One of the most important standards of ITU-T is G.711 which digitizes mono audio at a sampling rate of 8 kHz. This covers the frequency range between 300 and 3400 Hz. In Europe 64 kbps and in North America 56 kbps, are used for classical telephony with G.711. If a conventional telephone needs to be reached via IP, G.711 is used in VoIP. The latency of G.711 is not noticeable.

G.722

Another ITU-T standard is G.722, which offers a higher audio quality due to its 16 kHz sampling rate. The transmission bit rate is also 64 kbps (e.g. one ISDN B channel). Similar to G.711, latency is very low.
There are two possibilities to synchronize audio codecs with G.722:
G.722 with H.221 Inband Signalling (G.722/H.221):
In G.722/H.221 a bitrate of 1,6 kbps is occupied for Inband-information which is used for synchronisation.
G.722 with statistical synchronisation (G.722/SRT):
With G.722/SRT (SRT = Statistical Recovery Timing), a statistical analysis of the audio signal results in the recognition of the beginning of one Byte

G.722 with statistic Synchronization (G.722/SRT)

When using G.722/SRT (SRT = Statistical Recovery Timing) the byte beginning is found by static analysis. It should be noted that this works only with a real statistical signal such as music or language, but not with pure tones.

CELT

A method which has an extremely low latency while using low data rates between 48 kbit/ s mono and 128 kbit / s stereo. It is particularyly used with FlashCast.

 

Audio Coding Algorithms: Sample Rates in kHz
Audio Coding Algorithms: Sample Rates in kHz

 

MPEG Layer 2

Dating from the 1990’s, Layer 2 is still very popular and widely used in broad- casting applications. The main reasons for its popularity are its cascadability (good for a lossy codec), its high audio quality at high bit rates and availability in lots of first generation hardware & software products. MPEG Layer 2 supports bit rates of 8 to 384 kbps; target bit rate for stereo is 256 kbps.

MPEG Layer 3

Better known as mp3, this format is also used in broadcasting, e.g. when lower bit rates than Layer 2 are required. MPEG Layer 3 supports bit rates between 8 and 320 kbps with a target bit rate of 128 to 192 kbps for stereo.

MPEG 4 HE AAC

HE AAC is a further development of AAC using the so called SBR, Spectral Band Replication from Coding Technologies (www.codingtechnologies.com). AAC is the audio coding algorithm in MPEG 2 & 4 with the highest audio quality and a target bit rate of 128 kbps stereo. Because many applications need even lower bit rates, the Swedish-German co-operation in Coding Technologies invented the SBR technology which allows AAC with bit rates of even 32 or 48kbps stereo.

More than 90% of the bit rate is still used by the conventional AAC encoding, but a small part (<4kBit/s) is used for the SBR-information. The conventionally encoded AAC part is sampled with half the sampling frequency, 16, 22.05 or 24kHZ which results in a higher coding efficiency.

The combination of SBR and AAC is a high quality format. Although no claims for real transparency can be made, since the frequencies above 7/8 kHz are synthesized, it is still possible to reach an extremely good, CD like quality at such low bitrates. In connection with HE AAC, CD-like really means high quality, which is proven by the many applications and systems using it. Combined with so called “parametric stereo” coding, the algorithm is known as HE AACv2 and provides astonishing sound quality even at 16, 20 or 24 Kbit/s.

apt-X and Enhanced apt-X

Apt-X is a low latency coding format using ADPCM (Adaptive Differential Pulse Code Modulation). A typical data rate is 192 kbps for mono and sound quality is retained even after multiple en/decoding cycles; so called cascading. Theoretically the minimum latency is 3 ms at a sample rate of 48 kHz. The algorithm can be used at various other sample rates and recent improvements, especially to the dynamic range, resulted in Enhanced apt-X, which uses word lengths up to 24 bit. Apt-X is one of the most widely used systems for audio transmission with a short delay.

Its key features are:

  • 4:1:4 data compression
  • Mono / stereo audio encoder / decoder
  • Flexible sample rate up to 96kHz
  • Ancillary data up to 12 Kbit/s

Linear audio and AES/EBU transparent

The increasing availability of higher bandwidths means the use of linear audio and “AES/EBU Transparent“ is a realistic alternative. Linear audio is understood as a PCM signal with a specified word length of 16, 20 or 24 bit and a fixed sample rate. In the broadcasting world this is usually 48 or 96 kHz, resulting in a bitrate of 1,5 to 4,5 Mbit/s for a stereo signal.

An AES/EBU Transparent transmission can include Dolby E or DTS etc within the AES/EBU Signal without any sample rate conversion (the encoded data would otherwise be irreversibly corrupted). The data rate of AES/EBU signals is 3,072 Mbit/s with multichannel signals being a multiple of this.

Multichannel

Encoding of 5.1 or 7.1 multichannel signals is supported by a variety of formats. In particular HE AAC offers very efficient coding with bit rates below 128 kbps. Enhanced apt-X provides up to 8 channels with bit rates between 1-2 Mbit/s while linear formats move the bit rate up to 18 MBit/s.

 

Audio Audio Coding Algorithms: Bit rates in kbit/s
Audio Audio Coding Algorithms: Bit rates in kbit/s

Attention: open in a new window. | Print |