Tracking SoC Performance

The results of SOC architecture tuning can improve system performance by a factor of four or five times for essentially the same building blocks.

By Brian Clinton, Duolog
reprinted with permission by OCP-IP

Today's heterogeneous Systems on Chip (SoCs) architectures contain a variety of different building blocks like CPUs, DSPs, peripheral IPs, and subsystems. Included also is the modern Network on Chip (NoC), which acts as the central communication glue for these blocks.

The architect of the SoC has to come up with the best way to hook everything together to ensure that the desired applications can be run. Architects fit pieces of the jigsaw together in order to squeeze the required performance out of all the available building blocks in the system.

In fact, the result of this phase of the SoC development can't be underestimated. The result of SOC architecture tuning can improve system performance by a factor of four or five times for essentially the same building blocks. This might mean not having to go to the latest technology node or being able to exclude expensive dedicated hardware blocks. Therefore giving the product a better chance to get to market more quickly and with competitive advantages.

A major weapon in the bag of ammunition when assembling a SoC is the opportunity to use socketbased mechanisms. The advantage of socket-based protocols means that blocks can ideally be plugged and played as the architecture is being assembled.

Blocks having native socket interfaces implies that the performance is optimized when they are plugged together with other interfaces. The most comprehensive socket based protocol is the OCP socket from OCP International. The benefits of sockets are enormous, not only for assembly but for transaction and performance validation because there are now probe points throughout the system with a common communication protocol. None more so than the NoC fabric, where performance tuning begins and ends.

Performance analysis and tuning in the lifecycle of the development phase of a SoC is not just found in the high level ESL space of the SoC development, but additionally at other levels. As the development phase of a SoC has become a highly parallel iterative process, inevitable tweaks and changes in the system, usually initiated at the architecture level model, need to feed down into the RTL and emulation levels of the design flow. In other words, performance analysis is not just a game played at the ESL level. Therefore, key performance metrics need to be continually checked and updated during the lifecycle of the development phase of a SoC.

Figure 1. SoC development flow.

As shown in the development flow illustrated in Figure 1, a set of performance criteria in the shape of testcases or stress scenarios helps close the loop between the different development disciplines. The architecture team passes key architectural TC's or stress scenarios to the RTL validation teams, with key metrics like critical latency, bandwidth, throughput and bus occupancy figures to go along with them.

As the highly parallel engine of development runs and iterates, levels a, b, c, and d in Figure 1 are continually validated for performance. However, the sharing and reporting of these key performance metrics is difficult. Viewing and understanding information of simulated systems from each of these areas is also difficult. Moreover, the metrics that need to be checked at each level may vary. It is obviously necessary to understand how the architecture performs with different types of realistic traffic.

By understanding the real bottlenecks in the system the architect can make informed architectural decisions and trade offs that are used to improve system performance for the types of data in the system. Figure 2 shows how part of a typical heterogeneous Systems on Chip (SoCs) is organized. The NoC contains an address mapping from initiator to target system. The red points are possible monitor points where interesting traffic can be analyzed.

Figure 2. A typical heterogeneous SoC.

Introducing OCP-Tracker

OCP Tracker is a monitor and analysis toolset that allows performance, statistical, and transaction analysis of OCP interfaces. OCP-based events are monitored and written to standard OCP trace format. A transaction re-builder reconstructs OCP transactions from the trace files and builds up a data store. The OCP transaction data can then be analyzed via a 3D 360° navigation engine. User queries can be created, saved and used in both the 3D 360° navigation engine and the reporting tools.

Performance statistics are calculated based on user queries. Reports are then automatically formatted and generated.

Visualization

As with most complex and heterogeneous SoCs, the ability to visualize and analyze performance metrics is paramount to understanding and verifying system behavior and subsequently fine-tuning the system for optimal performance. This may not mean just looking at metrics over time but also having the capabilities to filter on various fine grained aspects of the system. OCPTracker's 2D and 3D viewing capabilities enable metrics to be viewed in a natural, intuitive way. Figure 3 shows typical 3D analysis. The bottlenecks in the system can be viewed very concisely and the problem areas very quickly identified and understood.

In this example, on the 3D scales are time vs. metrics vs. channels. Zooming and filtering of the data can be readily accomplished and metrics can be easily changed to essentially provide infinite ways to view the data, depending on the parameters needing to be understood or verified.

Querying

With the enormous amounts of data that have to be simulated in a performance-relegated simulation, having a way of querying different types of data and metrics from the data store is a must. Not only that, but also having the ability to efficiently share the queries between different teams and within the same team, decreases risk for misinterpretation and ensures smoother communication.

OCP-Tracker has a diverse array of querying options allowing quick and easy access to the most critical areas of interest. Queries are not just limited to finding latencies, bandwidths and other metrics above or below a certain threshold, but also have capabilities for deep fine grain analysis right down to transaction level. For example, particular OCP transaction types on a monitored channel can be readily and easily singled out and analyzed very quickly.

Figure 3. OCP-Tracker 3D viewer.

Figure 4. Screenshot from the OCP-Tracker query builder.

As stress testcases involve simulating or capturing data over a large timeframe, valid performance measurements should only take place at certain times, guided by certain events in the simulation. For example, bandwidth and latency measurements are only really meaningful on a particular socket when the source traffic on that socket is setup correctly and flowing. Taking an average measurement of bandwidth over the full simulation could skew the results.

Event triggers in the simulation are usually setup in the simulation itself to indicate a valid measurement.

Typically, the embedded processor(s) or testbench in the simulation controls when the condition is setup and ready for the calculation to take place. The processor(s) or testbench would initialize the system, configure peripherals and subsystems and start data transfers in the various multi-media blocks.

Once the conditions are ripe and the simulation activity is in the optimum start position, then the performance metrics should be measured. These controlled conditions allows the exact scenario to be analyzed, making the testcase become deterministic. In OCPTracker, these type of queries are called event-based queries. Figure 5 shows in more details how to configure these queries.

Figure 5. Event Query.

Overall there are three classes of event supported in the OCP Tracker Query Builder. OCP events include all OCP transactions occurring during the simulation. OOB (Out Of Bounds) events include DMA, SoC and Interrupt events. The final class of event is a Time event which is simply a specified time during the simulation to use as a trigger. The Query Builder allows a user to specify three different triggers for event queries. These are "Start After," "Measure From" and "Measure To.â€ Each of these triggers can be one of the specified event classes.

Report Generator

Reporting performance statistics requires extrapolating results from simulation runs which is a notoriously laborious and error prone task. OCP-Tracker exports graphical images and generates reports automatically based on the activity of the simulation. Having these features available means that data is available from the real simulation and not just an interpretation of the results.

Figure 6. Report Manager.

These reports can then be automatically inserted into higher level architectural reports and passed to technical management for analysis. Each report that is generated shows activity between Initiators and Targets within the scope of the data. A report can be based upon a query rather than reporting on the full simulation, and the report is likely to be shorter and more concise. Reports can be exported in: Rich Text Format, MS Access database format, MS Excel worksheet format, Snapshot format, Plain text, HTML and XML.

Performance Regression

As the highly iterative parallel process that is SoC development is in full swing, key metrics of performance need to be continually monitored. Not only in the ESL level, but in all levels indicated in Figure 1. Therefore it is necessary to have a mechanism that continually checks these metrics still hold true, as changes in the system happen continually.

Performance regression verifies the compliancy of these metrics as modifications to systems are occuring. Having test cases specifically aimed at key metrics ensures that bugs have been fixed, that no new bugs were introduced and that new features have not caused existing features to fail.

OCP-Tracker's regression management tool readily fits into an already existing simulation environment in a nonintrusive way. Simulation monitors are switched on in performance regression testcases and the Regression Manager polls a target simulation; as soon as end of simulation is detected, the golden files are read and queries and expected results are extracted from the file.

Each query in the golden file is run against the simulation data, and the result of the query is checked against the golden result. A pass or fail for the query is generated from this test. Once all queries have been run, and providing no queries failed, an overall PASS is given to the regression test. If any of the queries in the golden file fail, the overall result is a FAIL for the regression test. The results of the regression, including each individual query result, are stored in a text file.

Figure 7. Regression Dashboard.

Once a result has been achieved for a test case, a number of details of the test case can be viewed. By double-clicking each completed test case row, an options pop-up screen appears as illustrated in Figure 8.

Figure 8. Regression Options.

A number of tasks can be performed from this test case options dialog, as follows:

View Regression Results: The information in this file comprises a number of items, including the name of each trace file parsed and imported into the database, and the name and result of each query applied against the database. In addition to this however, the file also lists the success of each stage of the parsing and importing of the data, and the actual query results.
Open OCP Tracker Database: As expected, this option opens the specific OCP Tracker test case database. This is useful for viewing specific results in more detail.
View Error Message: If the Regression fails with a failure code of ERR, an error file is created in the root of the test case folder. The error log file can be viewed by selecting this option.
View (Edit) Golden File: Once a regression test case has completed and the results are available, you can decide to change the rules before running the test case again.
Repeat Golden Test (New Rules): After the Golden file rules have been modified using the above option, this will re-submit the test case to the Regression queue.
Repeat Testcase (New Logfiles): As an alternative to repeating the test case with new rules, you can repeat the test case with new log files.

Summary

The result of SOC architecture tuning has the potential to improve system performance by a factor of four or five times. The OCP Socket from OCP-IP international has enormous benefits. Having a comprehensive socket that can provide a common protocol on communication channels in the SoC makes debug and performance analysis much easier.

As the highly iterative parallel process that is SoC development moves long different functional stages, both performance along with functionality need to be continually measured and checked. Although a system may be functionally correct, one undetected performance related bug can cause an expensive respin.

OCP-Tracker is a monitor and analysis toolset that allows performance, statistical and transaction analysis of OCP interfaces. Visualization of performance related data is one of the keys to understanding how the system can cope with different types of traffic, enabling the timely identification and understanding of weaknesses in the architectural structure of the SoC.

Having the ability to deterministically craft test traffic and calculate and test performance metrics when it's meaningful to do so, is an efficiency needed for complex performance analysis. OCP-Tracker's querying capabilities allows performance data in the simulation to be extracted and viewed with efficiency with ease. As the SoC development process proceeds, the need to continually verify the compliancy of key performance metrics ensures that fixed bugs remain fixed, new bugs are not introduced and new features don't cause existing features to fail.

Duologâ€™s OCP Conductor is FREE to OCP-IP Members as part of their entitlement subscription.

Industry Articles

Tracking SoC Performance