Audio Coding for Wireless Applications
By Stephen Wray, Greg Massey Audio Processing Technology Ltd.
Wireless Audio is a current hot topic in Consumer Electronics. The world is wireless in many mediums, however high quality wireless audio has been slow to emerge.
Delivering seamless quality audio in real time using wireless technology is one of the great challenges facing the professional audio engineer. Wireless audio transfer has for some time been hampered by bandwidth constraints, coding delays and the introduction of bit errors which cause significant degradation to the audio quality.
The quantum leap in the mobile audio device market has resulted in a very savvy consumer that is well used to scrutinizing audio. Average Joe has grown a set of golden ears and is demanding that new product development not only removes wires to those ears, but the quality of the audio gets better rather than worse.
The increase in demand for professional grade audio in wireless headsets, wireless microphones, wireless 5.1 surround speakers and wireless live broadcasting necessitates a high quality feature rich audio coding solution.
Hurdling the technical barriers inherent in wireless audio applications, apt-X (an audio compression algorithm honed in the pro-audio market and now proven in consumer wireless audio market) is a strong solution for achieving high quality wireless audio transmission. Operating within the necessary bandwidth requirements and having a latency of <1.9mS, apt-X is extremely robust and is able to encode / decode optimal quality audio in real time.
This paper discusses wireless audio transmission and the issues that need consideration, and suggests the apt-X algorithm as an appropriate and advantageous coding solution.
1 Wireless Transmission
The proliferation of wireless technologies such as Bluetooth, WiFi, 3G and UMTS have given the end customer the ability to receive digital audio anywhere anytime.
However, with every advance there is always a bottleneck where one technology advances beyond the capabilities of another. Bandwidth limitation is an obvious problem for wireless applications, certainly for wireless microphones the scramble for available bandwidth in the switch over from Analogue to Digital TV may be dominated by the Telcos, and edge wireless audio into other more restrained spectrum space than they currently enjoy. For live performance or live streaming audio, coding delay is also a prohibitive constraint that has actually prevented such applications as digital wireless microphones entering the market. Coding delay also has implications for video applications where lip sync is required, for example a wireless stereo headset used in conjunction with a video iPod or mobile TV.
In order to take advantage of the improvements in wireless technology and bring it into the live transmission space the industry requires a low coding delay algorithm, with compression to meet the bandwidth constraints but enough quality to reach the 100 dB dynamic range required by most live applications and quality audio products.
2 Audio Quality
Sixteen bit audio is regarded as the entry level for audio systems now on the market with a minimum sample rate of 44.1 kHz to match that of the venerable audio CD.
Dynamic Range of 16 Bit Digital Audio = 20 Log10 (216) = 96.32 dB
Dynamic Range of 20 Bit Digital Audio = 20 Log10 (220) = 120.4 dB
Dynamic Range of 24 Bit Digital Audio = 20 Log10 (224) = 144.5 dB
Bit Rate per Channel
Table 1 Audio Metrics
Taking CD audio quality as a benchmark 16bit, 44.1kHz audio has a dynamic range of 96dB. To achieve this level of dynamic range in bandwidth limited applications such as Bluetooth Stereo headsets, it will be necessary to use at least 16-bit audio as the raw input and then use a compression technology that can reproduce virtually all the original dynamic range at the output. A challenge will be to find an algorithm that is able to deliver this quality level with very low latency. A greater challenge would be to use 24-bit audio whilst maintaining the low delay characteristic.
3 Coding Delay
The main difficulty for live audio is the coding /decoding delay of the compression technology. While in most wired solutions the audio coding delay is masked by the lengthy video decoding delay the wireless applications have no such luxury. The ability to lip sync to decoded video after having been encoded, packetised, passed over a wireless link and then decoded is indeed a significant challenge.
In most applications the radio will have its own inherent characteristics and be bound by a standard. If we assume that it is fixed and that the packing and unpacking of the RF protocol are fixed that only leaves the audio compression to work with.
If we look at Bluetooth for example, it uses a series of transmission and reception time slots that are fixed in size and therefore have a limitation in terms of maximum bit rate and response time. The protocol also utilises the ability to retransmit packets to correct errors in the transmitted stream.
If it was possible to minimise the retransmissions needed by making a more robust algorithm and also give it the ability to start the decoding process with only four encoded samples then it should be possible to improve the response of the system.
4 Compression Ratios
It can be said that all compression of audio results in some loss of audio content. The higher the compression ratio the more audio content is lost. Both ADPCM and perceptual codecs lose audio in some way during the encoding and decoding process. With perceptual codecs they analyse the frequency spectrum and remove content deemed to be imperceptible to the human ear. The resultant audio is tuned to the human ear and thus sounds good even with the audio content removed. This analysis requires a large audio sample (some 512 bytes) over which the analysis takes place. This is the source of the coding delay in many cases. The complexity of the audio can also affect the delay of the encoding process.
ADPCM codecs introduce other types of loss due to their own individual characteristics. The quantisation process is by nature lossy and, depending on the accuracy of the linear predictor and inverse quantisation used, they can produce small errors in the reproduced audio. However whilst the audio may contain small errors in the reproduced audio it does not remove audio content.
Because ADPCM does not analyse the audio spectrum the processing delay is significantly less but produces the same dynamic range and preserves the audio content.
5 Wireless World
Personal audio monitoring via a headset worn by an artist poses new challenges for digital wireless audio. Not only does the audio have to travel from the microphone to the mixing desk but it has to also travel all the way back to the artist.
So what you may ask is the acceptable delay an artist can deal with in a live performance situation, what is the goal of the ideal system. In recent tests it has been shown that delays greater than 10 ms can cause problems for artists depending on the type of music being played.
A similar restraint on latency is required for wireless speakers for Home Theatre, suffering delay greater than 10ms can impact that seamless full surround sound experience.
Bluetooth Stereo headsets currently use SBC, a low delay ADPCM type codec, however technology restraints means there is a requirement for a new algorithm that will improve the quality of the audio and improve the response time of the system, and therefore achieve lip sync with mobile video product.
5.1 Enhanced apt-X
Enhanced apt-X is based on sub-band Adaptive Differential Pulse Code Modulation (ADPCM). The algorithm can offer a word depth of 16, 20 or 24-bit and transparently codes PCM audio with a fixed compression ratio of 4:1.
A two stage QMF filter bank is used to split the audio spectrum into four discrete sub-bands. The QMF analysis filters used in the encoder give linear phase response, and the inverse QMF filter bank in the decoder whilst also giving linear phase response gives near perfect reconstruction of the presented audio.
The overall framework of apt-X ensures a high resilience to random bit errors. The bit error response is well matched to the auditory response of the human ear, with critical signals being relatively immune to errors. A bit error rate of 1:103 is normally inaudible. This inherent resilience can be attributed to the adaptive differential coding operating within the de-coupled sub-bands. Distortions introduced by bit errors are constrained within a sub-band. In addition the backward adaptive prediction and quantization tend to reduce the significance of random errors by spreading their effect over the trailing window of samples used for the adaptation. Furthermore, the magnitude of the effect of a bit error is proportional to the magnitude of the differential signal being decoded at that instant. Thus, if the transmitted differential signal is small, which will be the case for a low-level signal or a resonant highly predictable signal, any bit error will have very little effect on either the predictor or quantizer and hence should be inaudible.
Enhanced apt-X takes a series of PCM words and passes these through a 2-stage QMF tree. Along with splitting the signal into 4 sub-bands, the QMF also down samples the clock rate to ¼ the incoming clock and produces 4 x (16, 20 or 24-bit) Frequency samples, which are then passed into the four sub-band processing routines.
These signals still in for example, 16-bit format, are then simultaneously processed in four separate signal chains each incorporating a backward linear prediction loop that provides an estimation of the input signal. The prediction, based on the history of the previous 90 PCM samples, is subtracted from the input to yield a difference signal, which is commonly termed the error signal. It is this 16-bit error signal which is then re-quantised using, a backward adaptive Laplacian quantiser whose step sizes adapt to the magnitude of the error signal. The bit rate resolution of each of the four sub-band quantisers is different and much lower than the PCM bit rate. This reflects the non-linear frequency sensitivity characteristic of the human ear.
The four code-words from each sub-band are then multiplexed into a single 16 bit apt-X code word suitable for transmission. This apt-X code word therefore represents the content of the original 4 x 16 bit linear PCM samples (64 bits) and is a rate reduction of 4:1.
The input to the decoder is the Enhanced apt-X compressed word. This word is de-multiplexed into the four low bit resolution code-words which are fed into the four separate sub-bands and an identical predictor and Inverse Quantiser circuit is used to reconstruct the actual signal based on the difference signal received from the encoder. Finally the four reconstructed 16-bit, bandwidth limited samples are inverse filtered through the 2 stage inverse QMF filter and leave the Decoder in a serial data stream output as full bandwidth, 16-bit PCM samples. This has the same data rate as the original PCM signal at the input to the encoder, the output is pure linear PCM. Latency in the algorithm is simply due to the samples passing through the QMF stages, equating to 1.87ms @ 48kH
Enhanced apt-X also has an embedded word pattern to help connection and synchronisation. This feature, AutoSync, aids the ability to quickly synchronise i.e. 3 milliseconds on start-up or dropout.
Figure 1 shows the transient response of Enhanced apt-X. As can be seen the audio recovers to normal levels in less than 3 milliseconds.
Figure 1 Enhanced apt-X 16 Bit
The ability to recover quickly from packet loss improves the response of the overall system. This, in conjunction with the ability to start the synchronisation process from the next valid sample received, achieves synchronisation within 3 ms of receipt of that sample. The Bluetooth protocol relies on retransmission of packets to preserve audio quality with packets being rejected based upon the presence of bit errors or partial packet loss. If however it is possible to simply allow the decoder to continue processing even corrupted samples, the disruption to the audio stream should be minimal and reduce the latency of the system still further.
Another feature of Enhanced apt-X is that it can transport an auxiliary data stream at a rate of Fs/4 using a technique called Subtractive Buried Data. Using this technique it is possible to carry the data with no loss in audio quality. It is a raw bit stream and can therefore be used as part of wider system control and configuration.
Consumer wireless audio is a reality. Bluetooth technology is booming with audio a key driver. As applications now require more channels (ie Bluetooth Stereo) and interact with video, a compression algorithm is needed that has extremely low latency and maintains optimal quality of sound. The core strengths of minimal delay, error resilience and high audio quality make Enhanced apt-X the perfect enabler for wireless speakers, wireless microphones, and wireless headsets. apt-X is the ideal solution for high quality wireless audio transmission. Deliverable as Soft, Firm or Hard-IP, the Enhanced apt-X algorithm can be integrated in DSP, FPGA or ASIC.