by B. Ramanadin, JF. Vizier, C. Bertaux,STMicroelectronics - Grenoble, FRANCE
In the context of nowadays SOCs, on-chip busses have evolved from pure router to smart and clever IPs able to make better usage of the bus throughput. To do so, some control features have been added, allowing dynamic configuration of the bus behavior (change of priority scheme, routing arbitration scheme, bandwidth arbitration scheme, latency critical scheme...). In theory this allows to align the system behavior with the applications requirements. Unfortunately, this only allows control ability of the bus behavior but we don’t have any observability of the impact on real traffic. It leads application software to choose an arbitrary usage of the hardware rather than a better usage.
This paper presents a reusable method (including hardware, software, and external equipment) to allow the characterization of the traffic around the bus during real applications.
Many SOC products in STM are using STM’s preferred 1 on chip bus called STBus. This IP connects Initiators to Targets. An initiator can send requests to a target and then receive responses from it; a target receives the request, treats it and then sends the response. The aim of the STBus is to route requests and responses. A set of configuration registers allows dynamic change of the way requests and responses are handled. The traffic in terms of priority, latency, bandwidth is therefore subject to change. This has a tremendous impact on how the application Software should behave.
In order to give application engineers some understanding of what is happening around the bus inside the chip, a STBus Analyzer was designed.
Note1: the analyzer was designed to be easily extendable to analyse third party busses on future chips where several interconnect busses will co-exist
Note2: to avoid software intrusiveness, the analyser offers a mode where all events are signaled by outputting trace messages rather than using usual interrupt lines
Leveraging from STBus program, the analyzer was developed following the guiding principles below.
STBus - Methods
- Design Abstraction: Separate information from interface.
- Design Reuse: Develop generic code, enabling the IP to be connected to any number of STBus ports, of any kind.
- Design Assembly: Provide models to develop the system and its software. Provide tools to ensure quality and verification of IP. Provide templates to allow construction of systems.
This analyzer can be used at several stages of the application bring-up. Figure below illustrates how the bus analyzer fits in the whole validation system.
The Analyser is embedded in the System on Chip. It monitors some signals and computes metrics about them. The results of those metrics are output on a dedicated port of the SOC. Those results are then transmitted to a computer which receives them thanks to an acquisition board. Once received, those results are stored in the mass memory of the computer. The resulting file can then be processed to generate a human readable report.
2. SW Debug and application need:
2.1. Memory corruption debug
At early stages, usual problem is memory corruption. This can occur when an IP accesses an address which is reserved for another purpose. The most famous case is a write in the address range reserved for the instructions.
The analyzer offers a functionality called Watchpoint. This allows to observe transactions going on the selected bus ports and compare them against pre programmed patterns. The patterns can consist in address range, operation code and some other characteristics of the transaction. Once selected fields are seen on one of the spied bus ports, the transaction is captured and an interrupt occurs or a trace message is sent.
This functionality can be used to detect memory corruption. In this case, the matching pattern can be programmed with the address range of the instruction code and the opcodes corresponding to stores. In the case a store is attempted in the instruction range, the transaction will be captured. Then it is possible to know which initiator made this operation and what the data was.
2.2. Code profiling
When it is time to optimize CPU software, it is a pretty usual method to try and implement code profiling. The analyzer provides all necessary infrastructure to allow code pro- filing to be handled in hardware, therefore providing less intrusiveness in the software.
The Analyser provides a fast printf feature which allows to send trace messages containing a 32-bit data and a time stamp. The time stamp is generated by a counter running at the system clock frequency. It is sent in the message and allows to know when the message has been generated. The data sent in the message is the constant of a control register. A single write to this register will issue the message.
This functionality can be used to have a trace of the CPU code execution. By adding the generation of a fast printf at the beginning of the critical function, it is possible to know when and how many times those functions were launched. This won’t have an tremendous impact on the software as a single write operation is needed to issue the message.
2.3. Application specific events
To bridge the gap between application/bring up engineers and design engineers, it is interesting to know the value of some proprietary information (interrupt vectors, internal registers content,....) on an hardware specific event (video synchronisation signals, interruption...).
The analyser allows a data sampling triggered by an asynchronous signal. Once the data is sampled, a trace message is generated containing the time and the data sampled.
The usage can be the spying of a video pipe in which several synchronisation signals are present. This way we can know the time each of them occur and track any problem in their relative timings.
2.4. Traffic characterization
One of the topic the bus architects have to address is the amount of bandwidth to allocate to each IP. Characterizing the traffic generated by an IP is very important. However it’s not always trivial. A video compressor will behave the same way for each picture while the data transfer of CPU will depend on the task being executed.
The Analyser computes two kinds of metrics: bandwidth and latency. The bandwidth metric gives a bus occupation time, i.e. the number of clock cycle the routing resource has been available for a bus port. The latency metrics are of two kinds. The first one, the bus access time, gives the maximum amount of time a bus access needed to be granted. The second one, the request processing time, is the amount of time elapsed between the sending of a transaction on the bus and the receiving of the associated response.
The computation of those metrics is done at the transaction level. The interface with the physical layer is separated from the computation part. So it’s easy to apply the measurements to another bus, by just changing this simple interface.
The measurements are done during a configurable timeframe. Each time the time frame ends, the results of the metrics are sent in trace messages. The smaller the frame is, the most accurate the results are. A compromise is to be found between data size and results quality.
3. Outputs of the SBATM:
There are two ways to get the informations computed by the SBATM. The first one is to fetch the informations in the SBATM control registers. This mode is intrusive but allows the embedded software to react dynamically to the values read. The second method is to output trace messages. In the following, CPU means the processing unit embedded in the system on chip.
A. Intrusive mode
In this mode, the embedded software has to access the SBATM control registers. It obliges to add load operations in the existing code. This can be a big problem when the CPU is running a real time application.
To prevent the CPU from doing polling on the status registers, the SBATM features a maskable interrupt line which is risen when a transaction is matching the pattern pre programmed in the watchpoint (see section 2.1). When this interrupt is risen, a status register indicates why the interrupt occurred. The CPU can then read the value of the metric which generated the interrupt.
The interrupt is cleared by issuing a write operation to a status register. As long as the interrupt flag hasn’t been cleared, the value of the metric is frozen. This means that if another matching condition occurs before the interrupt is cleared, this information will be lost.
B. Non intrusive mode
In this mode, all the data computed by the SBATM are output on an external port. The software only needs to configure the control registers of the IP at the initialisation. The size of the output port can be chosen when generating the RTL code. Allowed values are 4, 8, 16 and 32 bits. The clock frequency of this data bus can be dynamically changed during operation and is obtain by division by two of the system clock frequency. It can be up to one half of the system clock frequency.
The following table gives the composition of a trace messages.
The fields are given in the order in which they are sent on the data port. The message length gives the length of the message in bytes. It can be from 2 to 24. The message type indicates the kind of message. There are 8 kinds of trace messages, listed in the next section. The port identifier tells which port the message is related with. Those three fields constitute the header of the message.
The payload is the data carried by the trace message. This data can be composed of several fields.
The time stamp indicates the time at which the message was generated. It is the value of a 48-bit free-running counter started a the beginning of the application.
4. Trace messages
There are 8 types of trace messages. Here is the list of the different kinds of message
- REF: This is the reference message. Metrics are computed during a time frame. The reference message is issued each time a time frame expires. The length of the time frame is configurable.
- FPF: This is the fast printf message. It is issued when a store access is done in the fpf_data control register. The data stored in this register is the payload of the message.
- SYNC: This is the synchronization message. It is issued each time the SBAMT is reconfigured or when storing FFFFFF in the fast printf data register.
- BLMI: This message contains the bandwidth and latency measurement for initiators. These are all the metrics computed for initiator ports (see section 2.4). This message is generated on a regular basis, just after a reference message is issued.
- BLMT: This message is equivalent to the BLMI but it concerns the target ports.
- PI: This is the proprietary information message. Its generation is triggered by asynchronous signals spied in the system.
- BUS: The bus message is generated each time the watchpoint mechanism matches a transaction. The message contains the fields of the matching transaction.
- ERR: This message is sent each time an error occurs. The causes of error are overflows in the computation of metrics.
5. External equipment:
5.1. The hardware
Since the bus analyzer generates trace messages, an acquisition system must capture and store them on an host for later processing. The acquisition system is pretty traditional except for the size and bandwidth of data it must sustain. The sustained bandwidth is up to 50 Mbyte/s, overall amount of possible data captured is in the order of several GBytes.
The hardware consists in a general purpose acquisition board. It is achitectured around an FPGA so the input data protocol can be easily changed.
5.2. The software
A. Global vision
Post processing is a software phase that consists on binary trace file parsing and human readable reports generation using the SBATM package programs. These programs are developed to ease the interpretation of trace messages generated by the SBATM. The complete SW package of the analyzer will always be the same whatever the implementation characteristics are. It is made of a set of programs used to decode and interpret trace messages generated by the analyzer. The top objective of the SW package is to develop a generic, reusable and easily expandable piece of SW that allows automatic analysis of the trace messages. To achieve this objective, the following SW architecture was chosen:
The main pieces of the SW package are:
SBATM Basic Driver: Provides all necessary tools to acquire trace messages and store them in a mass memory. btf2hrtf (Binary Trace File to Human Readable Trace File translator): It reads the binary file composed of stored trace messages and decodes it to a human readable file format. This is done by decoding the fields contained in the trace messages.
Reports generator: This part of the software allows to extract specific informations from the trace file and represent them in an easily usable format, like histograms, curves. User interface: This part eases the usage of all programs and link them automatically.
The environment above, is focussed on the SSP, but in the final applications, this package will be integrated in “top integrated circuit software” which will be responsible of the full circuit control as in figure:
That’s the reason why the acquisition board is architectured around an FPGA. Extra pins are previewed on the input/output port of the acquisition board. This way, it is possible to receive or send other signals like digital video flow.
STBus is considered as a reusable Soft-IP as it is generated and tuned for each application chip. The analyzer presented above is as re-usable as the bus is. It can be generated at the same time the STBus is generated and tuned to comply with each port characteristics.
Regardless of how many ports are observed, whether application specific events are observed or not, whether code pro- filing is activated or not, the output of the bus analyser is always the same, the external acquisition system remains the same, the statistical analysis tool remains the same.
To all people involved in the SBAG project (architecture, software, board design, silicon design, verification, staff, management) and more specifically to: Olivier Roulenq, Heni Arab, Yassine Elkhourassani, Abdelouahid Zakriti, Dominique Henoff, Christian Berthet, Jean- Marc Chateau, Nicolas Bigot.
1 The STBus spec is now public (protocol, concepts and layers, see: http://www.stmcu.com/inchtml-pages-STBus_intro.html)