Sharan Basappa R, HCL Technologies
Constrained random verification, for quite some time now, has been the default verification methodology for complex ASIC/SoC designs. Central to this methodology is the process of letting the randomization engine choose values from a set of possible values. The main lever available to guide this process of randomization is by appropriate constraints. Due to this predominant theme of randomization approach in the methodology, it is quite natural for verification engineers to approach every verification problem using randomization and then applying constraints to get appropriate control.
While constrained randomization is not, per se, an unacceptable approach, the question remains whether it is the most optimal approach for all type of stimulus that is required for verification. For example, as one goes from block level verification to SoC level, the level of randomness keeps reducing; in other words, the level of constraints increase. Similarly, as we go from block to SoC level verification, strict sequencing becomes critical and randomization has to be done keeping this sequencing requirement in mind. Consider for example, generation of processor instruction sequences. Needless to say, an effort to build such a sequence of instructions starting from basic random variables and then using constraints to create valid instruction sequence is rather cumbersome and time consuming. And added to that, if additional instructions are introduced later, making changes isn’t easy. At some point, this can become almost unmaintainable.
Randsequence construct, which is a part of SV, can be very invaluable in such situations, as the rest of this article will explain.
If the possible sequences are represented in the form of a tree or graphs then randsequence allows one to formally capture such a structure and allows users to create sequence for a selected path. In a way, the tree/graph represents all possible desired sequences in the overall verification space and each distinct path (from root to leaf) represents a specific sequence or scenario. This can be co-related to a scenario in verification parlance.
Conceptually, randsequence has its genesis in parsers, which is part of a modern high level language compiler Hence, it is useful to understand some basics of parsers in order to understand randsequence and exploit its full power. So, let us take a detour into the basics of parsers before we deep dive into randsequence.
Introduction to parsers
Almost every modern programming language (and that also includes SystemVerilog) is a context free language and is formally defined using what is known as BNF (Backus Naur Form). Context free form means that occurrence of constructs in the language is not dependent on the context in which they appear. It is this feature which makes all high level programming languages very powerful, with innumerable ways to achieve a given functionality. The flexibility of such a language can be seen by looking at a simple expression in ‘C’ language.
- y = 10+5
- y = a + 5
- y = 10 + a
- y = a + b
- y = a + (b+c) –> b + c is an expression within the main expression
- y = a + (b+(c+d)) – b + (c+d) is an expression within the main expression; c+d is a sub-expression within this
As you can see above, there are so many ways to describe an expression. It is the compiler that takes a program containing code snippets like the one above and then converts them into a machine executable code. A compiler has few distinct stages of processing as explained below,
Lexing: Every language has a valid set of words using which meaningful sentences are formed. This is nothing but lexicon for the given language. For example, “;”, “if”, “switch”, “continue”, “const” are all example of Lexical words in C language. Identifying the basic words of a language is the job of the Lexar.
Lexing is the normally the first stage in a compiler and is the process of identification of legal words in the language. Lexar (AKA scanner) tags the identified words with internal representation (or encoding) and dispatches them to Parser. These are collectively known as Terminal tokens as they represent the most basic elements in that language which cannot be divided further.
Parser: While Lexar finds all legal words in a program, it is not so useful unless these legal words appear in a specific order. If Lexar finds the legal words in a given language, Parser tries to check if these terminals appear in any specific sequence as defined by the language. For example, “y = a+b;” is a legal syntax while “y = a+;” is not even though the individual terminals are all legal. That is, y, a, + and ; are all legal as far as Lexar is concerned.
For example, if “y=a+b;” is the user’s program then Lexar finds terminals for characters/words like “y”, ”a”, ”b”, ”=”, “;” and passes them to Parser as terminals tokens. Parser takes these terminals and checks if they appear in a specific sequence. If a legal sequence is found then Parser creates a data structure that is called an internal data structure.
Both Lexar and parser work based on the rules specified. This is explained below through a simple example.
While it is a non-trivial task to implement Lexar and Parser, there are tools to simplify these. For example, Flex and Bison are GNU tools using which one can implement Lexing and Parsing respectively. One has to write rules as defined in the language and these are handled by Flex and Bison to output Lexar and Parser code applicable to that language. So, instead of writing the entire code to do Lexing and Parsing, one only has to write the rules while Flex and Bison generate actual Lexar and Parser codes.
Flex rules are written to find terminals in a given grammar and are generally written as Regular Expressions (Regex) rules. We will not go into further details since our area of interest is actually Parser.
Bison rules are written to find meaningful utterances in the input. These rules are also known Non Terminals or Production rules.
Let us now look at a Parser rules written using Bison to implement a reverse polish calculator
In the code listing above, P1-P3 are productions rules; “NUM” is the terminal token detected by Lexar when a number is detected by it. “ADD”, “SUB”, “MUL”, “DIV”, “EXP” are all terminals that are detected by Lexar corresponding to characters +, -, *, / and ^. NL standards for newline characters detected by Lexar.
Now let us see what happens at the parser when a user feeds in “2 3 +” as input.
Since Lexar is the first stage in a compiler, it identifies “2”,”3” and ”+” as terminals NUM, NUM and ADD and sends them to Parser. The processing in Parser will happen in the following manner,
- Terminal NUM (“2”) is fed (by Lexar to Parser) as input and results in production completion for exp, as per P3 rule
- exp itself is part of P3. Hence P3 is partially complete due to rules exp exp ‘ADD’ etc. At this point, P3 is waiting for exp as the next terminal from lexar. So, whether P3 would be complete depends on what terminal arrives next
- Remember exp is also a part of production P2. Hence P2 is partially hit due to P3
In summary, all 3 steps above happened with the input of just one character “2”
At this point, P3 is partially complete; P2 is partially complete
- Next, terminal NUM (3) is fed to parser. This hits production P3 as per the rule - exp: NUM
- Production exp (P3) itself will create partial production for P3 as per the rule – exp: exp exp * (* here signifies ADD/SUB etc.). A couple of points to note here
- As there are multiple rules corresponding to exp: exp exp …, there is partial production for all these rules. Once the next terminal is received, the right production would be resolved. This is assuming ADD/SUB/MUL is one of the terminal from Lexar
- Production P3 itself was partially complete based on the reception of the first terminal NUM (2)
- As per point 3) above, P2 which was partially hit and waiting for NL terminal will be nullified as the next terminal was NUM (instead of NL that was expected as per P2)
- Next, ADD is sent by Lexar based on user input of + character. This will complete P3 as per the rule exp: exp exp ADD. The other productions as per P3 will be nullified
- Now again, P2 would be partially hit due to production exp as per P3
- Next, user provides new line as input. This completes production P2
- Completion of P2 completes production P1
At this point, Parser is done processing Terminal ADD and waits for next Terminal to arrive from Lexar.
Few points to note before we complete this detour on Parsers and get back to our main topic of randsequence,
- Almost all modern programming languages are context free. This means that a given production (that is, legal utterance) is not valid only when it appears in a specific context but can appear in many ways as defined by the language.
- Grammars for programming languages are written using BNF format (Backus Naur form). Production rules written for Parser have close resemblance to BNF description. However, Parser additionally allows users to define actions associated with productions.
- A Parser is pretty much useless if it only detects legal utterances for a given grammar. To keep things simple, we have not shown actions associated with every production. In reality, Parsers actually execute certain actions and typically they create data structures for use by downstream stages of the compiler.
- Parsers generally operate based on LALR concept. LALR stands for look ahead left right processing. Here, look ahead refers to the fact that sometimes multiple productions are possible and Parser needs additional Terminals in order to unambiguously determine the correct production. It should be noted that typical parsers are of type LALR(1), which means that parser requires only one look ahead Terminal to clearly identify production. It should also be noted that it is language that defines how many look ahead terminals a parser needs for proper parsing and is an inherent property of the language, and not that of the parser.
- Lexars operate based on regular expressions. While regular expressions are pretty powerful at detecting patterns they are aren’t useful when it comes to sequences and recursions that are at the heart of any context free grammars. This is what Parsers do elegantly.
- Parser is always specific to a grammar. For example, C compiler will have parser specific to C language grammar.
- When used for compiler implementation, parsers generate output what is known as symbol table. Symbol table is used for rest of the compiler stages (e.g. optimization, machine code generation etc.).
- It should be noted that Lexars and Parsers (or Flex and Bison) are not always written to implement compilers. They can be used for variety of other applications to extract information from a code. A very good example is Synthesis tools used in ASIC/FPGA flows. Here, Parser’s are used to parse Verilog/VHDL/SV RTL code and convert that into gate level representation.
We have now seen how a Parser works. So, in summary, a parser is specific to the grammar of a language. It takes user input (typically a program) and then creates processed output. This is conceptually shown in the figure 1.
Think of Randsequence as complete opposite of parser. It basically takes grammar as input and generates possible legal input. To put in simple terms, if you define production rules using randsequence then it will automatically generate correct sequence as per the production rules. These sequences will indirectly form stimulus to the DUT. randsequence is conceptually shown in figure 2. Please note that, the grammar in the context of verification can refer to possible sequence to generate a processor instruction or sequence to generate a networking sequence such as TCP/IP. Examples for these are given in the subsequent section.
It is important to clarify that the term sequence here does not refer to UVM/OVM sequence but refers to an abstract term to signify the order of input to the DUT.
randsequence is very good in generation of sequences. It also provides certain features that are essential to constrained random methodology. These are explained below:
- Apart from maintaining a certain order, there are cases, where user does not care the exact sequence as long as overall sequence is maintained. Randsequence allows users to choose certain paths randomly while still maintaining overall order. This allows one to bring in a certan level of randomness while still maintaining overall order. Refer to “|” operator in randsequence syntax.
- As randsequence generates one legal sequence from all possible legal sequences, it also allows to specify weights in order to bias the events towards certain sequence(s). This is to give more weightate to certain sequence that are more interesting than others. Refer to “:=” operator in randsequence syntax.
- One can also repeat certain sequence any number of times. Refer to “repeat” operator in randsequence syntax.
Figure 1 Comparing Parser & randsequence flow
- One can conditionally select a sub-sequence if one desires. This is useful when external control of the randsequence is desired. Refer to “if”/”else” statements in randsequence syntax.
- one can select within multiple sub-sequences. Refer to “rand/join”.
Having seen randsequence features, it is important to understand when this construct is useful in verification. The example below illustrates typical uses of Randsequence.
Example 1: where it is essential to generate strict sequences for verification. A good example here is, sequence of CPU instructions in order to verify a processor sub-system. Typically, verification engineers would employ random variables and then apply constraints to generate desired instruction sequences. While this is perfectly fine, it is still not the most optimal approach. This method is not only complicated but also hard to maintain and hard to scale.
Generation of CPU instructions is shown using a simple example. Figure 2 shows a tree diagram while figure 3 shows actual SV code to implementing this example.
One can continue to extend this simple example to create more complex instruction sequence resulting in more complex assembly programs.
Figure 2: CPU instruction graph
Figure 3: SV code to generate processor instructions using randsequence
Example 2: At the SoC level, strict sequencing driven verification is becoming important essentially due to the fact that larger chunk of system functionality is being pushed into SoC. So, it is no longer sufficient to test SoC as a collection of disjoint IP blocks but rather in proper use-cases which involves interactions among these IP’s.
A good example of use-case driven verification is, say, when one has to generate TCP/IP type of traffic for a given DUT. Assuming that basic TCP/IP sequences are already present then SoC level test will mainly deal with building a top level sequence or test layer that generates proper sequences as per TCP/IP protocol hierarchy.
TCP/IP protocol hierarchy is shown in the figure 5 in a tree form. One has to select from the possible protocol tree paths shown in the figure. randsequence code snippet to generate TCP/IP sequence is shown in figure 6.
Figure 4: Graph to generate TCP/IP network traffic
Figure 5: SV code to generate TCP/IP network traffic using randsequence
Having seen advantages of randsequence, we also need to understand its shortcomings before one decides to deploy it. Randsequence is a procedural construct. So, one cannot extend existing randsequence to add additional functionality. Of course, by the virtue of the fact that randsequence is defined within SV classes, one can extend the class that contains randsequence and add additional behavior.
The primary goal of a verification is to find every possible defect in a design, in a reasonable amount of time. Languages like SystemVerilog are mere tools towards this endeavor; choosing the right tool (or the right constructs) can be the difference between a robust silicon versus a shaky silicon. Randsequence is one such construct that can be highly effective for verification. This is more relevant today as much more system functionality is being packed into silicon than ever before requiring close to system level simulation.
If you wish to download a copy of this white paper, click here