MPEG-4: A system designer's view
By Robert Bleidt, Director, Product Management/Business Strategy, Philips MP4Net, Sunnyvale, Calif., email@example.com, EE Times
November 12, 2001 (12:38 p.m. EST)
MPEG-4 is a unified standard that allows the streaming of both graphics and video in the same interactive, rich multimedia experience.
Nevertheless, the breadth of MPEG-4's capabilities often leads to confusion about what it includes and when practical implementations will be available. An MPEG-4 program can be as simple as a single stream of audio and traditional rectangular video, or as complex as a synchronized VRML-like interactive environment with many arbitrary-shaped video objects, synthesized audio, 2-D or 3-D vector graphics with texture maps and facial animations on avatars. Many of MPEG-4's capabilities are implemented today in commercial products, while others are still being developed.
To distinguish the mature from the promising, the MPEG-4 standard defines profiles that indicate the capabilities of audio, video, graphics and interactivity features. Version 2 of MPEG-4
includes a tool kit of 38 profiles, in contras t to MPEG-2's seven video profiles. An implementation is specified by combining desired profiles for each type of media included.
For example, Philips MP4Net has been shipping MPEG-4 products based on the Simple Visual profile that specifies video up to CIF (352- x 288-pixel) resolution and the High Quality Audio profile that supports speech using the CELP codec and music using the AAC codec.
During the next year, we plan to offer products with the improved compression efficiency and higher quality of the Advanced Simple Visual Profile. We'll also introduce combined video and graphic multimedia programs with the BIFS Core 2-D Graphics and Scene Graph Profiles.
So, with all these choices, how can designers implement profiles that are compatible with other products? Industry consortia such as the Internet Streaming Media Alliance (www.isma.tv) in the broad band and wired domain and the 3GPP Project (www.3gpp.org) in the wireless arena have issued specific sets of profiles and levels, along with portions of related standards that are appropriate for their applications. In ISMA's case, many MPEG-4 vendors have participated in "plug fests" to insure that their implementations are interoperable. For example, our latest broadband product is compatible with the ISMA 1.0 specification.
So where will MPEG-4 be used? We have identified several applications on emerging broadband networks where we think MPEG-4 will have an advantage.
First, we believe MPEG-4 will be used for secure broadband delivery of news, sports and entertainment content over IP networks such as ADSL to PCs and to television set-top boxes. Most proprietary streaming-format suppliers have portals or services that would compete with content providers or aggregators. With MPEG-4, content providers have an open format that allows them to choose from several suppliers for each component i n the signal chain.
The largest application is likely to be delivery of content to 2.5G and third-generation mobile phones and wireless devices. Operators have significantly invested in spectrum licenses for these new services. They need compelling applications to drive consumer purchase of handsets and airtime to recover their investment.
The stability, interoperability and robustness offered by MPEG-4 make it an ideal solution for mobile networks, and it has been selected by 3GPP for use on 3G phones.
Another market is delivery of multimedia or interactive TV content to devices such as Web pads or PDAs over short-range wireless networks. Devices in the home, office or public venues may gain access to MPEG-2 broadcast content or related enhancements at lower bit rates through transcoding to MPEG-4.
MPEG-4 has been attacked as being "science fiction" because of its optional object-coding techniques. In MPEG-4, it is possible to send not only rectangular video, but also arbitra ry-shaped video objects that are composited into a scene in the user's player. This allows personalization of a program to each user, or enhanced content to be seen only by premium subscribers. An example is Philips' "Buy the Ball" demonstration.
Segmenting an existing program into separate objects in a general case is a difficult problem. However, there are applications, such as sports, for which specialized real-time hardware already exists to track and separate objects. The general answer, though, is to move back in the signal chain to the creation stage. Almost all content exists as separate objects during the production process. For example, a newscast typically combines an anchorperson shot on a blue-screen set with background or over-the-shoulder video, separate titles from a graphics system and logo "bugs" or tickers inserted downstream. MPEG-4 offers the ability to transmit all of these objects separately to the player, where they can be selectively displayed.
Some people have criticized MPEG-4 as being subject to uncertain patent royalties. While the process for collecting royalties has always lagged the development and deployment of technology, the MPEG-4 Industry Forum (www.m4if.org) is asking MPEG-4 patent holders to agree on reasonable terms and conditions for licensing, like those for MPEG-2 and MP3. It should also be noted that a similar uncertainty surrounds today's proprietary streaming formats, which likely involve patent licensing issues of their own.
A simple MPEG-4 system consists of an encoder to convert content into MPEG-4 bit streams or files, an optional server to store and stream files to users and a decoder or player to convert them back into the original program. To accommodate programs that are more complex, an authoring tool is used to arrange objects and define their behavior and timing. The server and authoring tools usually run on general-purpose computers, so for embedded applications the encoder and decoder are the mos t important elements. Depending on the platform you're using, you have a number of options:
- A software-only player or codec, running on a general-purpose CPU
- A codec library for a coprocessor or DSP
- IP cores or models to incorporate a decoder into an ASIC or FPGA design
- Hardware codec or decoder chips
If your system runs a PC operating system such as Windows and uses at least a laptop-class processor, the choice is easy-use a software codec. For example, we offer a free Active-X control or stand-alone player for Windows. We have also achieved good results with a software player for WinCE devices such as the iPAQ PDA.
Of course, performance depends on the image format and bit rates used. Philips has obtained good results with QCIF video on a 206-MHz StrongARM and with CIF content on a 300-MHz Pentium. We are also investigating porting MPEG-4 software decoders to processors for mobile devices.
For a set-top box or Internet appliance, cost constraints us ually dictate a more modest processor, but the product's fixed functionality makes a coprocessor or DSP design attractive. For example, MP4Net offers an MPEG-4 decoder library for the TriMedia TM1300 processor that can render megabit MPEG-4 streams.
Several companies have announced design models or IP cores for embedding a decoder in a custom chip or perhaps a field-programmable gate array. Beyond that, several semiconductor companies are developing decoders and codecs as standard chip products, primarily for mobile handset use. These typically process QCIF-resolution video with power consumption in the 50- to 100-milliwatt range.
In determining which solution to use, consider these questions:
- Will you be using the Simple Visual profile of MPEG-4, which is very stable, or will you want to use profiles that are more advanced? Current hardware decoders and ASIC cores provide Simple profile decoding, but don't provide support for more advanced profiles.
- Does the product need t o provide other applications or just MPEG-4 decoding?
- Will third-party developers need to develop applications that involve MPEG-4?
- What are your concerns for power, image quality and cost?
On a wireless network, there is no guarantee that MPEG-4 streams will arrive uncorrupted at a client terminal or phone. Here are potential problems:
- A user's connection may suffer packet loss due to multipath or fading on the RF channel
- A user's bandwidth may be reduced as more users join a particular cell.
- Discontinuities in the flow of data may be due to a handoff from one cell to another as a user moves.
Several schemes permit MPEG-4 to deal with varying bandwidth. The simplest is stream switching, where a user transitions to an MPEG-4 stream encoded at a lower bit rate when his channel degrades. Proprietary formats on the narrowband Internet use similar techniques. Yet another option is temporal scalability, where a switch is made to a stream with similar spatial r esolution, but a lower frame rate.
A sophisticated approach is fine granular scalability, where the encoded content is divided into a low-bit-rate base-layer stream and an enhancement stream containing high-frequency or temporal detail in numerous small elements, which may be sent independently to fill the instantaneously available bandwidth. Vendors have discussed transcoding, or encoding content in real-time for each user to adjust the bit rate to his or her channel.
MPEG-4's advantages also lie in its incorporation of a framework to allow a mobile terminal's decoder to quickly resynchronize if the bit stream is interrupted by transmission errors. With earlier standards, an interruption in the bit stream required the decoder to discard information until the arrival of the next I-frame, causing the video to "freeze" for up to several seconds. MPEG-4 includes bit-stream resynchronization tools that allow a decoder to recover from data loss within the same frame, allowing it to better concea l transmission errors.