By Robert Payne, Sr. Vice President/General Manager, Advanced System Technology, Paul Wiscombe, SoC Architecture and Technology, Product Planning, and Advanced System Technology, Chief Technology Office, Philips Semiconductors, San Jose, Calif.
In the new world of streaming data, video content and the need for media processing have become ubiquitous. Video content, traditionally, found in home and office environments is moving into mobile and body networks, and there is a growing need to connect and manage the data flows within and between these two environments in a seamless manner. Given the ever-increasing percentage of software in products, the availability of semiconductor hardware resources and market influences, what will the streaming data system-on-chip (SoC) architecture of the future look like?
Architectures that support video processing must combine low-power, high-speed processing engines with efficient data movement between processing units in a manner that does not negatively impact software development productivity.
The dominant SoC communication challenge for media processing designs is the sheer volume of data that has to be moved through the processing units. New digital standards have caused this to grow even more. For example, the move from standard definition to high definition TV can increase the video bandwidth required by a factor of six.
In addition, there is a growing requirement for SoC designs to absorb functional complexity internally in order to hide complexity from the end user and thereby produce simple-to-use products. For example, a user may wish to record one input stream on a hard drive, while watching a DVD live, while also routing an audio or video through an external home network to another room.
This translates to a need for high-end solutions to simultaneously support multiple media processing operations and multiple streaming data paths within the SoC archit ecture. Of course, this places even more requirements on the internal communication mechanisms of the SoC to move data efficiently and to keep the audio and video streams aligned. The ability to task manage multiple streaming engines in software is key to supporting high-end streaming data products.
The traditional industry solution to the multimedia problem has been to deploy a single CPU, and to relentlessly increase it's speed to produce enough real-time processing power for media processing tasks. This approach results in an inefficient solution with high power dissipation, since the general-purpose processor is not optimized to perform any of the very specific media processing tasks.
Within the Nexperia Vidio SoC platform, a solution has been developed that is tuned to media processing applications and moves away from the high-speed monoprocessor. Optimized for data movement and streaming, it solves the problem by "computing in space" with a heterogeneous combination of task and softwar e-thread optimized parallel processors, rather than "computing in time" by trying to timeshare a single monoprocessor.
One of the fundamental architectural decisions is based on the recognition that two distinct types of computation are required - control flow and streaming data flow -and that they are best served by different types of processor.
Complex control flow needs a "branching" optimized type of processor that can handle convoluted decision flows with numerous if-then-else statements, and that efficiently supports instructions such as branch prediction and pre-fetch. These needs are serviced well by traditional RISC processors such as ARM and MIPS.
In contrast, manipulation of video and audio streams needs a "streaming" type of processor with more trivial decision control capability, but that is often heavily pipelined and optimized for processing large volumes of data. VLIW DSPs such satisfy these requirements.
In the current Nexperia Video SoC platform a MIPS CPU is used as the main control processor and VLIW DSP as the main media engine. Custom hardware blocks, or fixed-function streaming engines, augment these software-programmable resources.
Fixed-function streaming engines perform specific compute-intensive tasks such as MPEG decoding, video scaling and video composition. The cost of developing IP blocks for these functions can be amortized over several designs because they can be reused in many products.
They are optimized for low power, small die size and high throughput. The use of fixed-function streaming engines frees up capacity in the programmable engines, and provides flexibility to choose between software or hardware solutions for many tasks. But it can add to the software complexity, and standard components for these tasks are provided in the streaming software architecture.
We've implemented a sophisticated software architecture to properly manage and synchronize multiple pa rallel streaming tasks, maintaining efficient data flow, with specialized software to manage the audio/video streams and the partitioning of tasks across multiple processors.
A highly capable on-chip communication infrastructure is needed to support this architecture and to efficiently transport streaming data. In the Nexperia SoC platform, there is a single single streaming-data backbone, independent of and separate from a Memory Mapped I/O network.
Memory access uses a Pipelined Memory Access Network (PMAN) and a Memory Transaction Layer (MTL) protocol that is optimized for communicating to a memory controller. PMAN is a high bandwidth, hierarchical communication path between device IP and memory. Each processor has direct access to the memory controller.
Currently, in the Nexperia platform, the memory system is clocked at 250MHz to satisfy a 1 Gbytes/second data rate requirement. Point-to-point transf ers between IP use a Device Transaction Layer (DTL) protocol that is similar to, but precedes, the VSIA Virtual Component Interface (VCI) standard. The Device Control and Status Network (DCSN) provides a low latency communication path for the CPUs and other initiators to access registers in IP blocks.
This is a flexible and evolving architecture. IP blocks can be readily added or removed, and the on-chip interconnect can be recreated using automated generator tools. The physical design strategy breaks the netlist into "Islands of Synchronicity" that bound critical timing relationships into single, manageable entities. Island to island data transfers use skew-tolerant asynchronous and source synchronous techniques to ease top level timing closure.
SoC designs for streaming data processing applications need a combination of general-purpose "branching" processors and optimized "streaming" engines to supply enough processing capacity with reasonable power dissipation. They must also employ a hig h-bandwidth on-chip communication infrastructure to efficiently move streaming data between processing units and main memory.