Jalaj Jain, LSI Research and Development Pune Pvt. Ltd. Pravin Desale, LSI Research and Development Pune Pvt. Ltd. Pune, India
In this paper, we developed low power transport demultiplexer to support MPEG-2 transport streams for ATSC and DVB digital broadcast standards. Novel window based packet identification (PID) and section filtering is presented to provide a cost effective and flexible solution. Proposed transport demultiplexer is realized in silicon for 90nm process technology. The simulation results show that our proposed transport demultiplexer can achieve 60% power consumption saving.
Recent developments in interactive portable devices created a demand for a low power, cost effective multimedia system-on-chip (SoC) that can decode and display MPEG-2 video, audio and graphics . This new requirement leads to low power transport demultiplexer which is an integral part of multimedia SoC. To achieve cost effective design goal, transport demultiplexer design should be easily extendable to support different broadcast formats as new demands come in. In past, many transport demultiplexers have been developed which can be broadly classified in three categories, software only approach, hardware only approach and hybrid approach -. Software only approach is inefficient and requires large CPU cycles . Hardware only approach is expensive and customized to support only specific broadcast format . Hybrid approach exploits the advantages of both software and hardware approaches. Therefore, hybrid approach is preferred over software only and hardware only approach. In recent years, various transport demultiplexer designs have been proposed which are based on hybrid approach. Tetsuji proposed multi format transport demultiplexer for ARIB, ATSC, DSS and DVB formats . Branko proposed transport demu- -ltiplexer design with main emphasis on reducing system complexity and cost . However, existing solutions pay less attention to low power design aspects. Our work primarily focuses on low power and cost effective transport demultiplexer that can be easily extended to support DSS and ARIB formats. The organization of the paper is as follows. Section 2 discusses the targeted design goals. Proposed low power transport demultiplexer is detailed in section 3. Implementation results are summarized in section 4. Section 5 covers conclusion and remarks.
2 DESIGN GOALS
The proposed architecture is designed to meet, following set of design goals.
2.1 Low power
To design low power architecture, we estimate statistical distribution of video, audio, system and null packets over 31 days for ATSC MPEG-2 transport stream as shown in figure 1. It can be seen that, on average, 77% of the total bandwidth is occupied by video packets compared to 4% for the audio packets, 1% for table packets and 18% for null packets. Based on the above analysis, we design state machine based control unit which schedule the clock for different hardware stages. Since hardware stages, which are idle for most of the time, clock can be turned off. This forms the fundamental base for the proposed low power transport demultiplexer design.
2.2 Multi-format support
We observed that ATSC and DVB transport stream formats have many similarities except for system packets. For both the formats, program allocation table (PAT), program management table (PMT), condition access table (CAT), network identification table (NIT) table packets are a must to process. However Master guide table (MGT), terrestrial virtual channel table (TVCT), event information table (EIT), extended text tables (ETT), rating region table (RRT) table packets are specific to ATSC standard . Proposed architecture has static RAM based dedicated unit to handle system packets which is extendable to support more tables as needed for ATSC standard.
2.3 Extended Functionalities
The proposed architecture is capable of demultiplexing up to 4 simultaneous, independent programmes carried together with other programmes. Look-up-table (LUT) based approach is used to store packet identification (PID). LUT size can be increased in future to support more PID. And hence, more simultaneous independent programmes can be supported. Also we propose novel window based approach for transport header and adaptation filtering which facilitates for extension and support of more programs in future.
3 PROPOSED LOW POWER TRANSPORT DEMULTIPLEXER
In this section, we, first, briefly discuss the transport demultiplexing process followed by detailed discussion of the proposed architecture. Transport demultiplexing process is shown in figure 2(a), 2(b) and 2(c). Transport packets are comprised of 188- bytes, with one-byte for synchronization purpose, 3- bytes of headers identification, scrambling and control information, followed by 184-bytes of payload data. During transport demultiplexing process, first, synchronization lock is achieved by detecting 5 synchronization bytes at interval of 188- bytes or 204-bytes. Then, PID 0 transport packets are filtered out to construct the PAT table. PAT consists of PID of video and audio programs which are presented to the user. Once the user choice is made, PMT is constructed for user selected video program. PMT table lists the PID for CAT, video and table packets for user selected video PID. Extra video/audio channels choices are made to user, if PMT carries more than one video/audio programs. For the user selected video program, transport demultiplexer filters the video packets which are decoded by MEPG-2 decoder. In this process, transport demultiplexer also builds the electronic program guide (EPG) from program specific information (PSI) tables. EPG contains the information of video programs which will be transmitted in next 3 or 6 hours. Video packets are descrambled if scrambling flag is set to one in header. During transport demultiplexing process, PAT and PMT table packets are updated at regular interval. Transport demultiplexer interacts to host CPU through status and error registers bank (SERB).
Based on the above discussion, we propose to divide the demultiplexing process into the sequence of tasks, each of which is executed by a specialized hardware stage that operates with other stages in the pipeline manner. State machine based control unit schedules clock among various hardware stages. Proposed low power transport demultiplexer architecture is shown in figure 3. Its specification is shown in Table I.
Table I: TS demultiplexer specifications
|Input stream format ||DVB, ATSC |
|Standard ||MPEG-2 |
|Maximum input rate (serial) ||160Mbits/second |
|Maximum number of transport inputs ||4 |
|Output stream format ||PES/ES |
|Maximum number of video output ||4 |
|Maximum number of PID filter ||64 |
The proposed transport demultiplexer is divided into following units.
3.1 Front-end unit
Front-end unit works as an integrated receiver decoder (IRD) . It is mainly consists of tuner I/F, analog-to-digital converter (ADC), demodulator, forward error correction (FEC) unit and asynchronous serial interface (ASI). It sends 188-bytes or 204-bytes transport packets as a serial bit stream to core unit.
3.2 Core unit
Core unit is divided into three major units.
3.2.1 Input interface and sync unit
Block diagram of the input i/f and sync unit is shown in figure 4. The main functionality of this unit is as follows.
- Serial-to-parallel conversion.
- Recovery of parallel clock from serial clock.
- To synchronize the incoming transport stream at 188-byte or 204-byte interval. A first-in-first-out (FIFO) is used for smooth data operations between front-end unit and core-unit.
3.2.2 Novel window based PID filtering and adaptation unit
PID filtering and adaptation units are designed to support program clock reference (PCR) extraction, detection of PID of user selected programs & table packets and transport packet validation. In PID filtering unit, header of the incoming transport packet is compared to the PID stored in LUT, to determine whether or not transport stream packet needs to be parsed. If PID match is found, then payload data is transferred to clock-recoverymechanism or descrambler interface or routing unit. As per the ISO/IEC 13818-1 standard, if following flag in the transport packet header is set, then transport packet should not be consider as a valid packet .
- Transport_error_indicator (1-bit) is set to “0x1”.
- Transport_scrambling_control (2-bit) is set to “0x01” or “0x10” or “0x11”.
- Adaptation_field_control (2-bit) is set to “0x00” or “0x01”.
- Adaptation_field_length (8-bit) is less than “0x08”. In this case, PCR Field will be either missing or partially completed.
- PCR_flag (1-bit) is set to “0x1”. It indicates that adaptation field doesn’t contain PCR though transport stream packet PID matches to PCR PID as specified in PMT for a given program
MPEG-2 audio/video synchronization requires PCR and snapshot of local system time clock (STC) at the exact instant in which PCR arrives. Adaptation unit support PCR extraction and time stamping.
There are 2 possible ways to time-stamp the incoming transport packets, which are as follows.
- Parse the transport stream packet containing PCR and time stamp it (on the fly).
- Fixed interval time-stamping (offline alternative).
In both the mode, PID filtering and adaptation unit needs to parse the transport packets in following fashion.
- Detect the transport packets carrying PCR.
- Filter out the 42-bit PCR.
- Request STC information from STC generator.
Prepare following table per transport stream in the order mention below.
- 42-bit PCR value.
- 42-bit STC value.
- Memory address of the timestamped transport packet.
Figure 5 shows the block diagram of moving slide window based PID filter and adaptation unit. To parse 4 simultaneous transport streams, 4 dedicated moving slide windows are used. Data interpretations for moving data window 1 and 2 are shown in figure 6 and figure 7.
After parsing the transport header and adaptation field, transport packet payload data is transferred to descrambler. Transport header includes 2-bit field (transport_scrambling_flag) to indicate whether the transport packet is scrambled with default or even or odd control word. If this flag is set to one, payload data is transferred to external descrambler External descrambler uses entitlement control messages (ECM) and entitlement management messages (EMM) to generate the control words which is, then, used to descramble the data.
3.3 Back-end unit
Back-end unit consists of direct memory access (DMA), routing unit and table-section filtering. Routing unit manages video, audio and system buffers. It receives the control signal from global control unit to store the payload data through DMA in appropriate buffers allocated in SDRAM. In order to have smooth data transitions, 3 FIFO are used for temporary storage of video, audio and system data. DMA has 3 write channels, one each for video, audio and system data. Table section filtering unit compares the 10 maskable bytes in each incoming section with table section filters. CRC check is also performed to validate the table payload data.
3.4 Control unit
Figure 8(a) and 8(b) shows the flow graph of the control unit. Depending on the current status oftransport packet, it schedules the clock for all the hardware stages. If transport packet is carrying video payload data, control unit disables the clock for PCR extraction unit and table filtering unit. Similarly if transport packet is carrying table payload data, control unit decides if existing table needs to update. If not, control unit disables the clock for sync unit, PCR extraction unit and table filtering unit. Also control unit disables the clock for all the hardware stages if table packets are invalid. Sync unit is enabled once for every 188- byte since it compares the only first byte of transport packet to synchronization byte.
3.5 Status and Error registers bank (SER)
Each unit has set of status and error registers. Application software accesses these registers in order to control & monitor the operations. For example, if synchronization is lost, interrupt register will be set by transport demultiplexer to indicate that incoming transport packets are invalid.
4 IMPLEMENTATION RESULTS
The proposed low power transport demultiplexer has been described in verilog and synthesized with magma flow for 90nm TSMC library at 300MHz. Total gate count, including internal RAM (Random access memory) and FIFO, is 210k which results in 0.7 mm2 silicon area. Figure 9 shows the % of time for which various hardware stages in the proposed transport demultiplexer are turned off. It can be seen that, on average, sync-unit is turned off for 99 % total operating time, adaptation unit is turned off for 20% total operating time, PID filtering unit is turned off for 24% total operating time and table processing unit is turned off for 99% total operating time. Therefore result in overall 60% power saving compared to software only solution.
This paper has presented a unique approach to low power flexible architecture design for transport demultiplexer and its ASIC implementation. Proposed transport demultiplexer is capable of simultaneous demultiplexing of 4 transport streams and supports 64 PID filtering. We conclude that the proposed architecture can provide low power solution for transport demultiplexing in MPEG-2 based video codec targeted for handy consumer electronics product. Since architecture design is flexible, it can be easily extended for other standards i.e., ARIB and DSS.
 Yen-Kunang Chen, Sun-Yuan Kung, “Trends and Challenges with System-on-chip Technology for Multimedia system design”, Emerging Information Technology Conference 2005, 15-16 August 2005.
 C.Hanna, D.Gillies, E. Cochon, A. Dorner, J.Alred, and M. Hinkle, “ Demultiplexer IC for MPEG-2 Transport streams”, IEEE Translations on Consumer Electronics”, Volume 41, Issue 3, August 1999 pp. 699-706.
 Dongliang Guan, Songyu Yu, Changtai Liang, Xingdong Wang, “MPEG-2 TS Generate System and its Implementations with FPGA”, 4th International Conference on ASIC”, October 2001 pp. 510-513.
 Yukio Fujii, “Implementation of MPEG transport demultiplexer with a RISC-based microcontroller”.
 P.Stammnitz, K.Gruneberg, “Hardware Implementation of the Transport Stream Demultiplexer for the HDTV Demonstrator”, Signal Processing of HDTV,VI-1995.
 Tetsuji Sumioka, Chris Samwald, Masahiko Hoshina, Hirosi Adachi, and Nobuichi Watanabe, “ A Multi-format Transport Demultiplexer for ARIB, ATSC, DSS and DVB formats”, International conference on Consumer Electronics”, June 2000 pp. 390-391.
 Branko Kovacevic, “A Cost effective chip-set for HDTV with 2D/3D graphics”, International Conference in Consumer Electronics, 2000.
Figure 1: Video, audio and system packets distribution for ATSC transport stream
Figure 2(a): Transport demultiplexing process flow graph
Figure 2(b): Transport demultiplexing process flow graph
Figure 2(c): Transport demultiplexing process flow graph
Figure 3: Transport demultiplexer block diagram
Figure 4: Input interface and sync-unit block diagram
Figure 5: PID filter unit and adaptation unit block diagram
Figure 6: Data interpretation for moving slide window 1
Figure 7: Data interpretation for moving slide window 2
Figure 8(a): Low power state-machine flow graph
Figure 8(b): Low power state-machine flow graph
Figure 9: Idle time (% of total operating time) for various hardware stages in the proposed transport demultiplexer