Advanced ASIC Chip Synthesis - Bhatnagar

341

Transcript of Advanced ASIC Chip Synthesis - Bhatnagar

Page 1: Advanced ASIC Chip Synthesis - Bhatnagar
Page 2: Advanced ASIC Chip Synthesis - Bhatnagar

ADVANCED ASIC CHIP SYNTHESISUsing Synopsys® Design Compiler™

Physical Compiler™ and PrimeTime®

SECOND EDITION

Page 3: Advanced ASIC Chip Synthesis - Bhatnagar

Trademark Information

UNIX is a registered trademark of UNIX Systems Laboratories, Inc.Verilog is a registered trademark of Cadence Design Systems, Inc.RSPF and DSPF is a trademark of Cadence Design Systems, Inc.SDF and SPEF is a trademark of Open Verilog International.

Synopsys, PrimeTime, Formality, DesignPower, DesignWare and SOLV-IT! areregistered trademarks of Synopsys, Inc.

Design Analyzer, Design Vision, Physical Compiler, Design Compiler, DFTCompiler, VHDL Compiler, HDL Compiler, ECO Compiler, Library Compiler,Synthetic Libraries, DesignTime, Floorplan Manager, characterize, dont_touch,dont_touch_network and uniquify, are trademarks of Synopsys, Inc.

SolvNET is a service mark of Synopsys, Inc.

All other brand or product names mentioned in this document, are trademarks orregistered trademarks of their respective companies or organizations.

All ideas and concepts provided in this book are authors own, and are not endorsedby Synopsys, Inc. Synopsys, Inc. is not responsible for information provided in thisbook.

Page 4: Advanced ASIC Chip Synthesis - Bhatnagar

ADVANCED ASICCHIP SYNTHESIS

Using Synopsys® Design Compiler™Physical Compiler™ and PrimeTime®

SECOND EDITION

Himanshu BhatnagarConexant Systems, Inc.

KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW

Page 5: Advanced ASIC Chip Synthesis - Bhatnagar

eBook ISBN: 0-306-47507-3Print ISBN: 0-7923-7644-7

©2002 Kluwer Academic PublishersNew York, Boston, Dordrecht, London, Moscow

Print ©2002 Kluwer Academic Publishers

All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Kluwer Online at: http://kluweronline.comand Kluwer's eBookstore at: http://ebooks.kluweronline.com

Dordrecht

Page 6: Advanced ASIC Chip Synthesis - Bhatnagar

To my wife Niveditaand my daughter Nayan

Page 7: Advanced ASIC Chip Synthesis - Bhatnagar

Contents

Foreword

Preface

Acknowledgements

About The Author

xv

xvii

xxiii

xxv

CHAPTER 1: ASIC DESIGN METHODOLOGY 11.11.1.11.1.21.1.31.1.41.1.51.1.61.1.71.21.2.11.3

Traditional Design FlowSpecification and RTL CodingDynamic SimulationConstraints, Synthesis and Scan InsertionFormal VerificationStatic Timing Analysis using PrimeTimePlacement, Routing and VerificationEngineering Change Order

Physical Compiler FlowPhysical Synthesis

Chapter Summary

24568

10

12131617

11

Page 8: Advanced ASIC Chip Synthesis - Bhatnagar

viii

CHAPTER 2: TUTORIAL 192.12.22.32.3.12.3.22.42.5

Example DesignInitial SetupTraditional Flow

Pre-Layout StepsPost-Layout Steps

Physical Compiler FlowChapter Summary

20212222364242

CHAPTER 3: BASIC CONCEPTS 45

3.13.23.2.13.2.23.33.3.13.3.23.3.33.43.53.63.73.83.8.13.8.23.9

Synopsys ProductsSynthesis Environment

Startup FilesSystem Library Variables

Objects, Variables and AttributesDesign ObjectsVariablesAttributes

Finding Design ObjectsSynopsys FormatsData OrganizationDesign EntryCompiler Directives

HDL Compiler DirectivesVHDL Compiler Directives

Chapter Summary

45484849515152535455555657586061

CHAPTER 4: SYNOPSYS TECHNOLOGY LIBRARY 634.14.1.14.1.24.24.2.14.2.24.2.34.2.44.34.3.1

Technology LibrariesLogic LibraryPhysical Library

Logic Library BasicsLibrary GroupLibrary Level AttributesEnvironment DescriptionCell Description

Delay CalculationDelay Model

64646465656666717474

Page 9: Advanced ASIC Chip Synthesis - Bhatnagar

Contents ix

4.3.24.44.5

Delay Calculation ProblemsWhat is a Good Library?Chapter Summary

767779

CHAPTER 5: PARTITIONING AND CODING STYLES 815.15.25.2.15.35.3.15.3.25.3.35.3.45.3.55.3.65.3.75.3.85.45.4.15.4.25.4.35.4.45.55.5.15.5.25.6

Partitioning for SynthesisWhat is RTL?

Software versus HardwareGeneral Guidelines

Technology IndependenceClock Related LogicNo Glue Logic at the TopModule Name Same as File NamePads Separate from Core LogicMinimize Unnecessary HierarchyRegister All OutputsGuidelines for FSM Synthesis

Logic InferenceIncomplete Sensitivity ListsMemory Element InferenceMultiplexer InferenceThree-State Inference

Order DependencyBlocking versus Non-Blocking Assignments in VerilogSignals versus Variables in VHDL

Chapter Summary

8284848585858686878787878888899497989899

100

CHAPTER 6: CONSTRAINING DESIGNS 1016.16.1.16.1.26.26.36.3.16.3.26.3.36.46.5

Environment and ConstraintsDesign EnvironmentDesign Constraints

Advanced ConstraintsClocking Issues

Pre-LayoutPost-LayoutGenerated Clocks

Putting it TogetherChapter Summary

102102107114116117118119120122

Page 10: Advanced ASIC Chip Synthesis - Bhatnagar

x

CHAPTER 7: OPTIMIZING DESIGNS 1257.17.27.37.3.17.3.27.3.37.3.47.47.57.5.17.5.27.5.37.5.47.5.57.6

Design Space ExplorationTotal Negative SlackCompilation Strategies

Top-Down Hierarchical CompileTime-Budgeting CompileCompile-Characterize-Write-Script-RecompileDesign Budgeting

Resolving Multiple InstancesOptimization Techniques

Compiling the DesignFlattening and StructuringRemoving HierarchyOptimizing Clock NetworksOptimizing for Area

Chapter Summary

125129131131132134135137139139141144145148148

CHAPTER 8: DESIGN FOR TEST 1518.18.1.18.1.28.28.2.18.2.28.2.38.2.48.2.58.2.68.2.78.38.3.18.3.28.3.38.3.48.3.5

Types of DFTMemory and Logic BISTBoundary Scan DFT

Scan InsertionShift and Capture CyclesRTL CheckingMaking Design ScannableExisting ScanScan Chain OrderingTest Pattern GenerationPutting it Together

DFT GuidelinesTri-State Bus ContentionLatchesGated Reset or PresetGated or Generated ClocksUse Single Edge of the Clock

151152153153154157158161162164165166167167167168169

Page 11: Advanced ASIC Chip Synthesis - Bhatnagar

Contents xi

8.3.68.3.78.3.88.4

Multiple Clock DomainsOrder Scan-Chains to Minimize Clock SkewLogic Un-Scannable due to Memory Element

Chapter Summary

169170170173

CHAPTER 9: LINKS TO LAYOUT & POST LAYOUT OPT. 1759.19.1.19.1.29.1.39.1.49.1.59.1.69.1.79.29.2.19.2.29.2.39.2.49.2.59.39.3.19.3.29.3.39.3.49.4

Generating Netlist for LayoutUniquifyTailoring the Netlist for LayoutRemove Unconnected PortsVisible Port NamesVerilog Specific StatementsUnintentional Clock or Reset GatingUnresolved References

LayoutFloorplanningClock Tree InsertionTransfer of Clock Tree to Design CompilerRoutingExtraction

Post-Layout OptimizationBack Annotation and Custom Wire LoadsIn-Place OptimizationLocation Based OptimizationFixing Hold-Time Violations

Chapter Summary

177177179180180181182183183183188192194194199200202203205209

CHAPTER 10: PHYSICAL SYNTHESIS 21110.110.1.110.210.2.110.2.210.310.410.510.6

Initial SetupImportant Variables

Modes of OperationRTL 2 Placed GatesGates to Placed Gates

Other PhyC CommandsPhysical Compiler Issues.Back-End FlowChapter Summary

212212213213216220221223223

Page 12: Advanced ASIC Chip Synthesis - Bhatnagar

xii

CHAPTER 11: SDF GENERATION 225

11.111.211.2.111.2.211.2.311.2.411.2.511.3

SDF FileSDF File Generation

Generating Pre-Layout SDF FileGenerating Post-Layout SDF FileIssues Related to Timing ChecksFalse Delay Calculation ProblemPutting it Together

Chapter Summary

226228228231232233235237

CHAPTER 12: PRIMETIME BASICS 23912.112.1.112.1.212.1.312.212.2.112.2.212.2.312.312.3.112.3.212.3.312.3.412.4

IntroductionInvoking PTPrimeTime EnvironmentAutomatic Command Conversion

Tcl BasicsCommand SubstitutionListsFlow Control and Loops

PrimeTime CommandsDesign EntryClock SpecificationTiming Analysis CommandsOther Miscellaneous Commands

Chapter Summary

240240240241242243243245245245246250256259

CHAPTER 13: STATIC TIMING ANALYSIS 261

13.113.1.113.213.2.113.2.213.313.3.113.3.213.413.4.113.5

Why Static Timing Analysis?What to Analyze?

Timing ExceptionsMulticycle PathsFalse Paths

Disabling Timing ArcsDisabling Timing Arcs IndividuallyCase Analysis

Environment and ConstraintsOperating Conditions – A Dilemma

Pre-Layout

261262263263267270270272272273274

Page 13: Advanced ASIC Chip Synthesis - Bhatnagar

Contents xiii

13.5.113.5.213.613.6.113.6.213.6.313.713.7.113.7.213.7.313.7.413.813.8.113.8.213.8.313.8.413.9

Pre-Layout Clock SpecificationTiming Analysis

Post-LayoutWhat to Back Annotate?Post-Layout Clock SpecificationTiming Analysis

Analyzing ReportsPre-Layout Setup-Time Analysis ReportPre-Layout Hold-Time Analysis ReportPost-Layout Setup-Time Analysis ReportPost-Layout Hold-Time Analysis Report

Advanced AnalysisDetailed Timing ReportCell SwappingBottleneck AnalysisClock Gating Checks

Chapter Summary

275276278278279280284285286289291292293296297300303

APPENDIX A 306

APPENDIX B 319

INDEX 321

Page 14: Advanced ASIC Chip Synthesis - Bhatnagar

Foreword

The semiconductor industry has a proven track record of quickly reducingreference to IC design scale to the ridiculously irrelevant. We, as a group,quickly saturated our terminology to refer to levels of integration as weapplied the term "Large Scale Integration" (LSI) in the mid 80's to chipscontaining more than 1,000 transistors and moved to the more progressive"Very Large Scale" (VLSI) as we passed into the 10,000 to 100,000transistor territory only a year or two later. A few more attempts at renamingour design output with terms such as ULSI (Ultra-Large Scale Integration)were fortunately left in the annals of history as the more insightful realizedthat the consequences of Moore's Law would quickly require us to movebeyond the confines of the English language to create appropriatesuperlatives. We, however, could not resist changing our conception of thedesign task by coining the phrase "System on a Chip" in the early to mid 90'sto convey the understanding that these levels of integration allowed for thedevelopment of more than complex electronic components but self containedinformation processing systems. Once again, however, we find ourselvesstruggling with the reality that the "systems" referred to only 3 to 4 years agoare today barely enough to fill the pad ring of a modest pin count device.

We should not be surprised, therefore, that some in the design community arerecognizing the need to again rethink and redefine the metrics that scope themodern IC design task. Now, however, instead of focusing on the collection

Page 15: Advanced ASIC Chip Synthesis - Bhatnagar

xvi

of transistors or functions as a metric of design production, this group hasmoved to focus on our most precious commodity, time. For them, today'sdesign task is being defined by the window of opportunity in which thedesign output is relevant, usually a period that cannot extend beyond 12 - 18months. This group, therefore, is focused on the tools and techniques thatcan raise the design productivity to the point that the transistor counts,functions and subsystems that can fill the silicon can be reliably designed andcharacterized in this amount of time. We should not be caught completely bysurprise therefore if our lexicon begins to define levels of integration withterms such as SMSI ("Six Month Scale Integration") or TMSI ("TwelveMonth Scale Integration") perhaps.

This books sets itself squarely in the middle of this effort as it explores andconveys a collection of tools and techniques focused on dramaticallyreducing the time required to complete the IC design task and get an ICproduct to market. The author, Mr. Bhatnager, takes a set of today’s mostproductive IC design tools and exposes ways in which these tools can beapplied to further streamline the full design process. These techniqueschallenge the designer to move beyond linear high-level design flows thatutilize HDL languages for design description, synthesis to create gate andtransistor implementation and timing analysis. This book exposes practicaltechniques by which more information can either be introduced sooner in thedesign flow or fed back quicker in order to both reduce the number ofiterations and the complexity associated with each one. The result istechniques that lead to better quality designs sooner.

Today's semiconductor business operates in the world of compressed timeand hyper-competition. To compete effectively in this world every designerand every design team is well advised to focus on continually improvingtheir time-to-market metric. This book will serve the advanced student inVLSI design as well as the seasoned practitioner in this quest.

Mr. F. Matthew RhodesSr. Vice President and General ManagerPersonal Computing DivisionConexant Systems, Inc.

Page 16: Advanced ASIC Chip Synthesis - Bhatnagar

Preface

This second edition of this book describes the advanced concepts andtechniques used towards ASIC chip synthesis, physical synthesis, formalverification and static timing analysis, using the Synopsys suite of tools. Inaddition, the entire ASIC design flow methodology targeted for VDSM(Very-Deep-Sub-Micron) technologies is covered in detail.

The emphasis of this book is on real-time application of Synopsys tools, usedto combat various problems seen at VDSM geometries. Readers will beexposed to an effective design methodology for handling complex, sub-micron ASIC designs. Significance is placed on HDL coding styles,synthesis and optimization, dynamic simulation, formal verification, DFTscan insertion, links to layout, physical synthesis, and static timing analysis.At each step, problems related to each phase of the design flow are identified,with solutions and work-around described in detail. In addition, crucial issuesrelated to layout, which includes clock tree synthesis and back-endintegration (links to layout) are also discussed at length. Furthermore, thebook contains in-depth discussions on the basics of Synopsys technologylibraries and HDL coding styles, targeted towards optimal synthesis solution.

Target audiences for this book are practicing ASIC design engineers andmasters level students undertaking advanced VLSI courses on ASIC chipdesign and DFT techniques.

Page 17: Advanced ASIC Chip Synthesis - Bhatnagar

xviii

This book is not intended as a substitute or a replacement for the Synopsysreference manual, but is meant for anyone who is involved in the ASICdesign flow. Also, it is useful for those designers (and companies) who donot have layout capability, or their own technology libraries, but rely onoutside vendors for back-end integration and final fabrication of the device.The book provides alternatives to traditional method of netlist hand-off tooutside vendors because of various problems related to VDSM technologies.It also addresses solutions to common problems faced by designers wheninterfacing various tools from different EDA tool vendors.

All commands have been updated to Tcl version of Design Compiler in thisedition of the book. The commands have been changed to reflect the mostup–to–date version (2000.11—SP1) of Synopsys suite of tools.

Overview of the Chapters

Chapter 1 presents an overview to various stages involved in the ASICdesign flow using Synopsys tools. The entire design flow is briefly described,starting from concept to chip tape-out. This chapter is useful for designerswho have not delved in the full process of chip design and integration, butwould like to learn the full process of ASIC design flow.

Chapter 2, outlines the practical aspects of the ASIC design flow as describedin Chapter 1. Beginners may use this chapter as a tutorial. Advanced users ofSynopsys tools may benefit by using this chapter as a reference. Users withno prior experience in synthesis using Synopsys tools should skip thischapter and return to it later after reading the rest of the chapters.

The basic concepts related to synthesis are described in detail in Chapter 3.These concepts introduce the reader to synthesis terminology usedthroughout the later chapters. Readers will find the information provided hereuseful by gaining a basic understanding of these tools and their environment.In addition to describing the purpose of each tool and their setup, this chapteralso focuses on defining objects, variables, attributes and compiler directivesused by the Design Compiler.

Chapter 4 describes the basics of the Synopsys technology library. Designersusually do not concern themselves with the full details of the technology

Page 18: Advanced ASIC Chip Synthesis - Bhatnagar

xix

library, as long as the library contains a variety of cells with different drivestrengths. However, a rich library usually determines the quality of synthesis.Therefore, the intent of this chapter is to describe the Synopsys technologylibrary from the designer’s perspective. Focus is provided on delaycalculation method and other techniques that designers may use in order toalter the behavior of the technology library, hence the quality of thesynthesized design.

Proper partitioning and good coding style is essential in obtaining qualityresults. Chapter 5 provides guidelines to various techniques that may be usedto correctly partition the design in order to achieve the optimal solution. Inaddition, the HDL coding styles is covered in this chapter that illustratesnumerous examples and provides recommendations to designers on how tocode the design in order to produce faster logic and minimum area.

The Design Compiler commands used for synthesis and optimization aredescribed in Chapter 6. This chapter contains information that is useful forthe novice and the advanced users of Synopsys tools. The chapter focuses onreal-world applications by taking into account deviations from the idealsituation i.e., “Not all designs or designers, follow Synopsysrecommendations”. The chapter illustrates numerous examples that helpguide the user in real-time application of the commands.

Chapter 7 discusses optimization techniques in order to meet timing and arearequirements. Comparison between older version of Design Compiler and thenew version is highlighted. Emphasis is provided on the new optimizationtechnique employed by Design Compiler called “TNS”. Also, detailedanalysis on various methods used for optimizing logic is presented. Inaddition, different compilation strategies, each with advantages anddisadvantages are discussed in detail.

DFT techniques are increasingly gaining momentum among ASIC designengineers. Chapter 8 provides a brief overview of the different types of DFTtechniques that are in use today, followed by detailed description on howdevices can be made scannable using Synopsys’s Test Compiler. It describescommands used for inserting scan through Design Compiler. A multitude ofguidelines is presented in order to alleviate the problems related to DFT scaninsertion on a design.

Page 19: Advanced ASIC Chip Synthesis - Bhatnagar

xx

Chapter 9 discusses the links to layout feature of Design Compiler. Itdescribes the interface between the front-end and back-end tools. Also, thischapter provides different strategies used for post-layout optimization ofdesigns. This includes in-place and location based optimization techniques.Furthermore, a section is devoted to clock tree insertion and problems relatedto clock tree transfer to Design Compiler. Various solutions to this commonproblem are described. This chapter is extremely valuable for designers (andcompanies) who do not posses their own layout tool, but would like to learnthe place and route process along with full chip integration techniques.

The introduction of Physical Compiler dramatically changed the traditionalapproach to synthesis. Chapter 10 describes this flow in detail. The chapterdescribes various methods of achieving optimal results using PhysicalCompiler. In order to understand the Physical Compiler flow, readers areadvised to read all chapters related to the traditional flow (especially Chapter9) before reading this chapter. This will help correlate the topics described inthis chapter to the traditional flow. Various example scripts are provided inthis chapter illustrating the usage of this novel tool.

Chapter 11, titled “SDF Generation: for Dynamic Timing Simulation”describes the process of generating the SDF file from Design Compiler orPrimeTime. A section is devoted to the syntax of SDF format, followed bydetailed discussion on the process of SDF generation, both for pre and post-layout phases of the design. In addition, few innovative ideas and suggestionsare provided to facilitate designers in performing successful simulation. Thischapter is useful for those designers who prefer dynamic simulation methodto formal verification techniques, in order to verify the functionality of thedesign.

Chapter 12 introduces to the reader, the basics of static timing analysis, usingPrimeTime. This includes a brief section devoted to Tcl language that isutilized by PrimeTime. Also described in this chapter are selected PrimeTimecommands that are used to perform static timing analysis, and also facilitatethe designer in debugging the design for possible timing violations.

The key to working silicon usually lies in successful completion of statictiming analysis performed on a particular design. This capability makes static

Page 20: Advanced ASIC Chip Synthesis - Bhatnagar

xxi

timing analysis one of the most important steps in the entire design flow andis used by many designers as a sign-off criterion to the ASIC vendor. Chapter13 is devoted to several basic and advanced topics on static timing analysis,using PrimeTime. It effectively illustrates the usage of PrimeTime, both forthe pre and the post-layout phases of the ASIC design flow process. Inaddition, numerous examples on analyzing reports and suggestions onvarious scenarios are provided. This chapter is useful to those who wouldlike to migrate from traditional methods of dynamic simulation to the methodof analyzing designs statically. It is also helpful for those readers who wouldlike to perform in-depth analysis of the design through PrimeTime.

Conventions Used in the Book

All Synopsys commands are typed in “Ariel” font. This includes all examplesthat contain synthesis and timing analysis scripts.

The command line prompt is typed in “Courier New” font. For example:

dc_shell> and, pt_shell>

Option values for some of the commands are enclosed in < and >. In general,these values need to be replaced before the command can be used. Forexample:

set_false_path –from <from list> –to <to list>

The “\” character is used to denote line continuation, whereas the “|”character represents the “OR” function. For example:

compile –map_effort low | medium | high \–incremental–mapping

Wherever possible, keywords are italicized. Topics or points, that needemphasis are underlined or highlighted through bold font.

Page 21: Advanced ASIC Chip Synthesis - Bhatnagar

Acknowledgements

I would like to express my heartfelt gratitude to a number of people whocontributed their time and effort towards this book. Without their help, itwould have been impossible to take this enormous undertaking.

First and foremost, a special thanks to my family, who gave me continuoussupport and encouragement that kept me constantly motivated towards thecompletion of this project. My wife Nivedita, who patiently withstood mynocturnal and weekend writing activities, spent enormous amount of timetowards proofreading the manuscript and correcting my "Engineers English".I could not have accomplished this task without her help and understanding.

I would like to thank my supervisor, Anil Mankar for giving me amplelatitude at work, to write the book. His moral support and innovativesuggestions kept me alert and hopeful. I would also like to thank mycolleagues at Conexant; Khosrow Golshan who helped me design the frontcover of the book. He also provided me inescapable suggestions for thebackend design flow. Young Lee, Hoat Nguyen, Vinson Chua, Hien Truong,Songhua Xu, Chilan Nguyen, Randy Kolar, Steve Schulz, Richard Ward,Sameer Rao, Chih-Shun Ding and Ravi Ranjan who devoted their precioustime in reviewing the manuscript.

I was extremely fortunate to have an outstanding reviewer for this project,Dr. Kelvin F. Poole (Clemson University, S.C.). I have known Dr. Poole for

Page 22: Advanced ASIC Chip Synthesis - Bhatnagar

xxiv

a number of years and approached him for his guidance while writing thisbook. He not only proofread the entire manuscript word-by-word (gritting histeeth, I'm sure!), but also provided valuable suggestions, which helped makethe book more robust. Thank you Dr. Poole.

I wish to express my thanks to Bill Mullen, Ahsan Bootehsaz, Steve Meier,Russ Segal, Juergen Froessl, Elisabeth Moseley, Kelly Conklin, BobMoussavi and Amanda Hsiao at Synopsys, who participated in reviewing thismanuscript and provided me with many valuable suggestions. Julie Liedtkeand Bryn Ekroot of Synopsys helped me write the necessary Trademarkinformation. Special thanks are also due to Jeff Echtenkamp, HeratchAvakian, Chung-Jue Chen and Chin-Sieh Lee of Broadcom Corporation forproviding me valuable feedback and engaging in lengthy technicaldiscussions. Thanks are also due to Kameshwar Rao (Consultant) Jean-Claude Marin (ST Microelectronics, France), Tapan Mohanti (CentilliumCommunications), Dr. Sudhir Aggarwal (Philips Semiconductors) and AbuHoraira (Intel Corporation) for giving me positive feedback at all times.Their endless encouragement is very much appreciated.

During SNUG 2000, I met Cliff Cummings (President & Consultant,Sunburst Designs). Cliff is very well known in this industry as an expert inVerilog RTL coding and synthesis. I asked him to help me review certainchapters of my book. I would like to thank him for providing valuablesuggestions, which I incorporated in Chapter 5.

Writing the second edition of this book took longer than previouslyanticipated. The main reason was the introduction of Physical Compiler. Iwanted to enhance the book but did not want to write about something thatwas not mature. Carl Harris of Kluwer Academic Publishers understood thisand supported me throughout the project. His understanding even when Ikept on delaying the book is appreciated.

A final word, “Thank you Mom and Dad for your endless faith in me”.

Himanshu BhatnagarConexant Systems, Inc.Newport Beach, California

Page 23: Advanced ASIC Chip Synthesis - Bhatnagar

About The Author

Himanshu Bhatnagar is an ASIC Design Group Leader at Conexant Systems,Inc. based in Newport Beach, California U.S.A. Conexant Systems Inc. is theworld’s largest independent company focused exclusively on providingsemiconductor products for communication electronics. Himanshu has beeninstrumental in defining the next generation ASIC design flowmethodologies using latest high performance tools from Synopsys and otherEDA tool vendors.

Before Joining Conexant, Himanshu worked for ST Microelectronics inSingapore and the corporate headquarters based in Grenoble, France. Hecompleted his undergraduate degree in Electronics and Computer Sciencefrom Swansea University (Wales, U.K), and his masters degree in VLSIdesign from Clemson University, (South Carolina, USA).

Page 24: Advanced ASIC Chip Synthesis - Bhatnagar

1

ASIC DESIGN METHODOLOGY

As deep sub-micron semiconductor geometries shrink, traditional methods ofchip design have become increasingly difficult. In addition, an increasingnumbers of transistors are being packed into the same die-size, makingvalidation of the design extremely hard, if not impossible. Furthermore,under critical “time-to-market” pressure the chip design cycle has remainedthe same, or is constantly being reduced. To counteract these problems, newmethods and tools have evolved to facilitate the ASIC design methodology.

The main function of this chapter is to bring to the forefront different stagesinvolved in chip design as we move deeper into the sub-micron realm.Various techniques that improve the design flow are also discussed.

Since the last edition of this book, Synopsys introduced another tool calledPhysical Compiler. In the tool, synthesis and placement now are more tightlycoupled. Consequently, there is a dramatic change in the traditional designflow. This chapter stresses the importance of the new techniques to thereader, and explains the necessity of these techniques in the design flow toachieve the maximum benefit, by reducing the overall cycle time. Since thetool is fairly new to the IC design world, and as yet, not embraced 100% by

Page 25: Advanced ASIC Chip Synthesis - Bhatnagar

2 Chapter 1

the ASIC design community, both the traditional and the new flows arediscussed.

This chapter focuses on the entire synthesis based ASIC design flowmethodology, from RTL coding to the final tape-out. Both the traditionaland the Physical Compiler based flow are discussed.

1.1 Traditional Design Flow

The traditional ASIC design flow contains the steps outlined below.Figure 1-1 illustrates the flow chart relating to the design flow describedbelow. Subsequent chapters describe in detail synthesis related topics.

1.2.3.4.

5.

6.

7.

8.

9.

Architectural and electrical specification.RTL coding in HDL.DFT memory BIST insertion, for designs containing memory elements.Exhaustive dynamic simulation of the design, in order to verify thefunctionality of the design.Design environment setting. This includes the technology library to beused, along with other environmental attributes.Constraining and synthesizing the design with scan insertion (andoptional JTAG) using Design Compiler.Block level static timing analysis, using Design Compiler’s built-in statictiming analysis engine.Formal verification of the design. RTL compared against the synthesizednetlist, using Formality.Pre-layout static timing analysis on the full design through PrimeTime.

10.11.

12.

13.14.

Forward annotation of timing constraints to the layout tool.Initial floorplanning with timing driven placement of cells, clock treeinsertion and global routingTransfer of clock tree to the original design (netlist) residing in DesignCompiler.In-place optimization of the design in Design Compiler.Formal verification between the synthesized netlist and clock treeinserted netlist, using Formality.

Page 26: Advanced ASIC Chip Synthesis - Bhatnagar

ASIC DESIGN METHODOLOGY 3

Page 27: Advanced ASIC Chip Synthesis - Bhatnagar

4 Chapter 1

15.

16.

17.

18.19.20.21.22.

23.

Extraction of estimated timing delays from the layout after the globalrouting step (step 11).Back annotation of estimated timing data from the global routed design,to PrimeTime.Static timing analysis in PrimeTime, using the estimated delays extractedafter performing global route.Detailed routing of the design.Extraction of real timing delays from the detailed routed design.Back annotation of the real extracted timing data to PrimeTime.Post-layout static timing analysis using PrimeTime.Functional gate-level simulation of the design with post-layout timing (ifdesired).Tape out after LVS and DRC verification.

Figure 1-1, graphically illustrates the typical ASIC design flow discussedabove. The acronyms STA and CT represent static timing analysis and clocktree respectively. DC represents Design Compiler.

1.1.1 Specification and RTL Coding

Chip design commences with the conception of an idea dictated by themarket. These ideas are then translated into architectural and electricalspecifications. The architectural specifications define the functionality andpartitioning of the chip into several manageable blocks, while the electricalspecifications define the relationship between the blocks in terms of timinginformation.

The next phase involves the implementation of these specifications. In thepast this was achieved by manually drawing the schematics, utilizing thecomponents found in a cell library. This process was time consuming andwas impractical for design reuse. To overcome this problem, hardwaredescription languages (HDL) were developed. As the name suggests, thefunctionality of the design is coded using the HDL. There are two mainHDLs in use today, Verilog and VHDL. Both languages perform the samefunction, each having their own advantages and disadvantages.

There are three levels of abstraction that may be used to represent the design;Behavioral, RTL (Register Transfer Level) and Structural. The Behavioral

Page 28: Advanced ASIC Chip Synthesis - Bhatnagar

ASIC DESIGN METHODOLOGY 5

level code is at a higher level of abstraction. It is used primarily fortranslating the architectural specification, to a code that can be simulated.Behavioral coding is initially performed to explore the authenticity andfeasibility of the chosen implementation for the design. Conversely, the RTLcoding actually describes and infers the structural components and theirconnections. This type of coding is used to describe the functionality of thedesign and is synthesizable to form a structural netlist. This netlist comprisesof the components from a target library and their respective connections;very similar to the schematic based approach.

The design is coded using the RTL style, in either Verilog or VHDL, or both.It can also be partitioned if necessary, into a number of smaller blocks toform a hierarchy, with a top-level block connecting all lower level blocks.

Synopsys recently introduced Behavior Compiler, capable of synthesizing Behaviorlevel style of coding. Since this is a major topic of discussion and is not relevant tothis book, only RTL related synthesis is covered in this book.

1.1.2 Dynamic Simulation

The next step is to check the functionality of the design by simulating theRTL code. All currently available simulators are capable of simulating thebehavior level as well as RTL level coding styles. In addition, they are alsoused to simulate the mapped gate-level design.

Figure 1-2, illustrates a partitioned design surrounded by a test bench readyfor simulation. This test bench is normally written in behavior HDL while theactual design is coded in RTL.

Usually the simulators are language dependent (either Verilog or VHDL),although there are a few simulators in the market, capable of simulating amixed HDL design.

Page 29: Advanced ASIC Chip Synthesis - Bhatnagar

6 Chapter 1

The purpose of the test bench is to provide necessary stimuli to the design. Itis important to note that the coverage of the design is totally dependent onthe number of tests performed and the quality of the test bench. This is thereason why a sound test bench is extremely critical to the design. During thesimulation of the RTL, the component (or gate) timing is not considered.Therefore, to minimize the difference between the RTL simulation and thesynthesized gate-level simulation at a later stage, the delays are usuallycoded within the RTL source, usually for sequential elements.

1.1.3 Constraints, Synthesis and Scan Insertion

For a long time, the HDLs were used for logic verification. Designers wouldmanually translate the HDL into schematics and draw the interconnectionsbetween the components to produce a gate-level netlist. With the advent ofsynthesis tools, this manual task has been rendered obsolete. The tool hastaken over and performs the task of reducing the RTL to the gate-levelnetlist. This process is termed as synthesis.

Page 30: Advanced ASIC Chip Synthesis - Bhatnagar

ASIC DESIGN METHODOLOGY 7

Synopsys's Design Compiler (from now on termed as, DC) is the de-factostandard and by far the most popular synthesis tool in the ASIC industrytoday.

Synthesizing a design is an iterative process and begins with defining timingconstraints for each block of the design. These timing constraints define therelationship of each signal with respect to the clock input for a particularblock. In addition to the constraints, a file defining the synthesis environmentis also needed. The environment file specifies the technology cell librariesand other relevant information that DC uses during synthesis.

DC reads the RTL code of the design and using the timing constraints,synthesizes the code to structural level, thereby producing a mapped gate-level netlist. This concept is shown in Figure 1-3.

Page 31: Advanced ASIC Chip Synthesis - Bhatnagar

8 Chapter 1

Usually, for small blocks of a design, DC’s internal static timing analysis isused for reporting the timing information of the synthesized design. DC triesto optimize the design to meet the specified timing constraints. Further stepsmay be necessary if timing requirements are not met.

Most designs today, incorporate design-for-test (DFT) logic to test theirfunctionality, after the chip is fabricated. The DFT consists of logic andmemory BIST (built-in-self-test), scan logic and Boundary Scan logic(JTAG) etc.

The logic and memory BIST comprises of synthesizable RTL that is basedupon controller logic and is incorporated in the design before synthesis.There are tools available in the market that may be used to generate the BISTcontroller and surrounding logic. Unfortunately, Synopsys does not providethis capability.

The scan insertion may be performed using the test ready compile feature ofDC. This procedure maps the RTL directly to scan-flops, before linking themin a scan-chain. An advantage of using this feature is its ability to enable DCto take the scan-flop timing into account while synthesizing. This techniqueis important since the scan-flops generally have different delays associatedwith them as compared to their non-scan equivalent flops (or normal flops).

JTAG or boundary scan is primarily used for testing the board connections,without unplugging the chip from the board. The JTAG controller andsurrounding logic may also be generated directly by DC.

1.1.4 Formal Verification

The concept of formal verification is fairly new to the ASIC designcommunity. Formal verification techniques perform validation of a designusing mathematical methods without the need for technologicalconsiderations, such as timing and physical effects. They check for logicalfunctions of a design by comparing it against the reference design.

Page 32: Advanced ASIC Chip Synthesis - Bhatnagar

ASIC DESIGN METHODOLOGY 9

A number of EDA tool vendors have developed the formal verification tools.However, only recently, Synopsys also introduced to the market its ownformal verification tool called Formality.

The main difference between formal methods and dynamic simulation is thatformer technique verifies the design by proving that the structure andfunctionality of two designs are logically equivalent. Dynamic simulationmethods can only probe certain paths of the design that are sensitized, thusmay not catch a problem present elsewhere. In addition, formal methodsconsume negligible amount of time as compared to dynamic simulation.

The purpose of the formal verification in the design flow is to validate theRTL against RTL, gate-level netlist against the RTL code, or the comparisonbetween gate-level to gate-level netlists.

The RTL to RTL verification is used to validate the new RTL against the oldfunctionally correct RTL. This is usually performed for designs that aresubject to frequent changes in order to accommodate additional features.When these features are added to the source RTL, there is always a risk ofbreaking the old functionally correct feature. To prevent this, formalverification may be performed between the old RTL and the new RTL tocheck the validity of the old functionality.

The RTL to gate-level verification is used to ascertain that the logic has beensynthesized accurately by DC. Since the RTL is dynamically simulated to befunctionally correct, the formal verification of the design between the RTLand the scan inserted gate-level netlist assures us that the gate-level also hasthe same functionality. In this instance if we were to use the dynamicsimulation method to verify the gate-level, it would have taken a long time(days and weeks, depending on the size of the design) to verify the design. Incomparison, the formal method would take a few hours to perform a similarverification.

The last part involves verifying the gate-level netlist against the gate-levelnetlist. This too is a significant step for the verification process, since it ismainly used to verify – what has gone into the layout versus what has comeout of the layout. What comes out of the layout is obviously the clock treeinserted netlist (flat or hierarchical). This means that the original netlist that

Page 33: Advanced ASIC Chip Synthesis - Bhatnagar

10 Chapter 1

goes into the layout tool is modified. The formal technique is used to verifythe logic equivalency of the modified netlist against the original netlist.

1.1.5 Static Timing Analysis using PrimeTime

As previously mentioned, the block level static timing analysis is done usingDC. Although, the chip-level static timing can be performed using the aboveapproach, it is recommended that PrimeTime, be used instead. PrimeTime isthe Synopsys stand-alone sign-off quality static timing analysis tool that iscapable of performing extremely fast static timing analysis on full chip-leveldesigns. It provides a Tcl interface that provides a powerful environment foranalysis and debugging of designs.

The static timing analysis, to some extent, is the most important step in thewhole ASIC design process. This analysis allows the user to exhaustivelyanalyze all critical paths of the design and express it in an orderly report.Furthermore, the report can also contain other debugging information like thefanout or capacitive loading of each net.

The static timing is performed both for the pre and post-layout gate-levelnetlist. In the pre-layout mode, PrimeTime uses the wire load modelsspecified in the library to estimate the net delays. During this, the sametiming constraints that were fed to DC previously are also fed to PrimeTime,specifying the relationship between the primary I/O signals and the clock. Ifthe timing for all critical paths is acceptable, then a constraints file may bewritten out from PrimeTime or DC for the purpose of forward annotation tothe layout tool. This constraint file in SDF format specifies the timingbetween each group of logic that the layout tool uses, in order to perform thetiming driven placement of cells.

In the post-layout mode, the actual extracted delays are back annotated toPrimeTime to provide realistic delay calculation. These delays consist of thenet capacitances and interconnect RC delays.

Similar to synthesis, static timing analysis is also an iterative process. It isclosely linked with the placement and routing of the chip. This operation is

Page 34: Advanced ASIC Chip Synthesis - Bhatnagar

ASIC DESIGN METHODOLOGY 11

usually performed a number of times until the timing requirements aresatisfied.

1.1.6 Placement, Routing and Verification

As the name suggests, the layout tool performs the placement and routing.There are a number of methods in which this step could be performed.However, only issues related to synthesis are discussed in this section.

The quality of floorplan and placement is more critical than the actualrouting. Optimal cell placement location, not only speeds up the finalrouting, but also produces superior results in terms of timing and reducedcongestion. As explained previously, the constraint file is used to performtiming driven placement. The timing driven placement method forces thelayout tool to place the cells according to the criticality of the timing betweenthe cells.

After the placement of cells, the clock tree is inserted in the design by thelayout tool. The clock tree insertion is optional and depends solely on thedesign and user’s preference. Users may opt to use more traditional methodsof routing the clock network, for example, using fishbone/spine structure forthe clocks in order to reduce the total delay and skew of the clock. Astechnologies shrink, the spine approach is getting more difficult to implementdue to the increase in resistance (thus, RC delays) of the interconnect wires.It is therefore the intent of this section (and the entire book) to stress solelyon the clock tree synthesis approach.

At this stage an additional step is necessary to complete the clock treeinsertion. As mentioned above, the layout tool inserted the clock tree in thedesign after the placement of cells. Therefore, the original netlist that wasgenerated from DC (and fed to the layout tool), lacks the clock treeinformation (essentially the whole clock tree network, including buffers andnets). Therefore, the clock tree must be re-inserted in the original netlist andformally verified. Some layout tools provide direct interface to DC toperform this step. Chapter 9 introduces some of these steps, both traditionaland not-so-traditional approaches. For the sake of simplicity, lets assume thatthe clock tree insertion to the original netlist has been performed.

Page 35: Advanced ASIC Chip Synthesis - Bhatnagar

12 Chapter 1

The layout tool generally performs routing in two phases – global routingand detailed routing. After placement, the design is globally routed todetermine the quality of placement, and to provide estimated delaysapproximating the real delay values of the post-routed (after detailed routing)design. If the cell placement is not optimal, the global routing will take alonger time to complete, as compared to placing the cells. Bad placementalso affects the overall timing of the design. Therefore, to minimize thenumber of synthesis-layout iterations and improve placement quality, thetiming information is extracted from the layout, after the global routingphase. Although, these delay numbers are not as accurate as the numbersextracted after detailed routing, they do provide a fair idea of the post-routedtiming. The estimated delays are back annotated to PrimeTime for analysis,and only when the timing is considered satisfactory, the remaining process isallowed to proceed.

Detailed routing is the final step that is performed by the layout tool. Afterdetailed route is complete, the real timing delays of the chip are extracted,and plugged into PrimeTime for analysis.

These steps are iterative and depend on the timing margins of the design. Ifthe design fails timing requirements, post-layout optimization is performedon the design before undergoing another iteration of layout. If the designpasses static timing analysis, it is ready to undergo LVS (layout versusschematic) and DRC (design rule checking) before tape-out.

It must be noted that all steps discussed above can also be applied forhierarchical place and route. In other words, one can repeat these steps foreach sub-block of the design before placing the sub-blocks together in thefinal layout and routing between the sub-blocks.

1.1.7 Engineering Change Order

This step is an exception to the normal design flow and should not beconfused with the regular design cycle. Therefore, this step will not beexplained in subsequent chapters.

Page 36: Advanced ASIC Chip Synthesis - Bhatnagar

ASIC DESIGN METHODOLOGY 13

Many designers regard engineering change order (ECO) as the changerequired in the netlist at the very last stage of the ASIC design flow. Forinstance, ECO is performed when there is a hardware bug encountered in thedesign at the very last stage (say, after tape-out), and it is necessary toperform a metal mask change by re-routing a small portion of the design.

As a result ECO is performed on a small portion of the chip to preventdisturbing the placement and routing of the rest of the chip, therebypreserving the rest of the chip’s timing. Only the part that is affected ismodified. This can be achieved, either by targeting the spare gatesincorporated in the chip, or by routing only some of the metal layers. Thisprocess is termed as metal mask change.

Normally, this procedure is executed for changes that require less than 10%modification of the whole chip (or a block, if doing hierarchical place androute). If the bug fix requires more than 10% change then it is best to repeatthe whole procedure and re-route the chip (or the block).

The latest version of DC incorporates the ECO compiler. It makes use of themathematical algorithms (also used by the formal verification techniques), toautomatically implement the required changes. Making use of the ECOcompiler provides designers an alternative to the tedium of manuallyinserting the required changes in the netlist, thus minimizing the turn-aroundtime of the chip.

Some layout tools have incorporated the ECO algorithm within their tool.The layout tool has a built-in advantage that it does not suffer from thelimitation of crossing the hierarchical boundaries associated with a design.Also, the layout tool benefits from knowing the placement location of thespare cells (normally included by the designers in the design), thus can targetthe nearest location of spare cells in order to implement the required ECOchanges and achieve minimized routing.

1.2 Physical Compiler Flow

With shrinking semiconductor geometries, synthesis results based on wire-load models are getting too inaccurate and unpredictable. Physical Compiler

Page 37: Advanced ASIC Chip Synthesis - Bhatnagar

14 Chapter 1

a new tool from Synopsys bypasses this issue by integrating synthesis andplacement within one common engine, thus avoiding the delay computationbased on wire-load models.

The basic Physical Compiler design flow contains the steps outlined below.Figure 1-4 illustrates the flow chart relating to the design flow describedbelow. Some commonality exists between the traditional flow and thePhysical Compiler based flow, therefore only steps relevant to the PhysicalCompiler flow are outlined.

1.

2.3.

4.

5.

6.7.8.

9.

Design environment setting. This includes both the technology library andthe physical library to be used, along with other environmental attributes.Floorplan the design.Constrain, synthesize (with scan insertion) and generate placement of thedesign using Physical Compiler.Pre-layout static timing analysis using PrimeTime (delay numbers basedon placement rather than wire-load models).Formal verification of the design. RTL against the synthesized netlist,using Formality.Port the netlist and the placement information over to the layout tool.Insert clock tree in the design using the layout tool.Formal verification between clock tree inserted netlist and the originalscan inserted netlist.Perform detailed routing using the layout tool.

10.11.12.13.

Extract real timing delays from the detailed routed design.Back-annotate the real extracted data to PrimeTime.Post-layout static timing analysis using PrimeTime.Functional gate-level simulation of the design with post-layout timing (if

desired).14. Tape out after LVS and DRC verification.

Page 38: Advanced ASIC Chip Synthesis - Bhatnagar

ASIC DESIGN METHODOLOGY 15

Page 39: Advanced ASIC Chip Synthesis - Bhatnagar

16 Chapter 1

1.2.1 Physical Synthesis

Traditionally synthesis methods are based on using the wire-load models.The basic nature of the wire-load models is such that they are fanout based.In other words, the delay computation of cells is performed based on thenumber of fanouts a cell drives. While this method was ideal for largergeometries (>0.35um), it is not suitable for smaller geometries. Theresistance of wires is dominating the cell delays causing the fanout baseddelay computation to be unreliable and totally unpredictable.

The concept of physical synthesis was recently introduced by Synopsys inthe form of Physical Compiler (henceforth, called PhyC) as a solution to theabove problem. The previous capability of DC is retained and the PhyCenhancements have been added on top of DC thus making PhyC a superset ofDC.

PhyC does not use the wire-load models; instead the delay computation isbased on the placement rather than fanout. In other words, the synthesis andoptimization is based on the placement of cells. By incorporating scan chainre-ordering capability in the current release of PhyC (2000.11), it indeedmakes it an extremely powerful and useful tool.

Figure 1-4 illustrates this approach in a very generic form. PhyC can be usedin two modes: RTL-to-placed-gates (rtl2pg) or Gates-to-placed-gates (g2pg).For the former mode, the input to PhyC is the RTL, the floorplaninformation, along with the necessary setup to include logical and physicallibraries. The output produced by PhyC is a structural netlist and the placedgates information in PDEF3.0 format. The second mode of g2pg is providedthat can be used for optimizing an existing gate level netlist based on thefloorplan information. In this case, instead of the RTL, the input to PhyC isthe gate level netlist. The rest of the setup and I/O files remain the same.

One important point to note is that in the current release of PhyC (version2000.11), does not have the capability of synthesizing clock trees in thedesign. Synopsys has recently announced the availability of the Clock TreeCompiler tool that is an add-on to PhyC. Users who do not have access tothis tool have no option but to use their layout tool to insert clock tree in thedesign database.

Page 40: Advanced ASIC Chip Synthesis - Bhatnagar

ASIC DESIGN METHODOLOGY 17

A number of steps must be performed in order to perform successfulsynthesis. These will be discussed later in subsequent chapters. For themoment, however, the process illustrated above is sufficient for the purposeof explaining the design flow.

1.3 Chapter Summary

In this chapter the ASIC design flows incorporating the latest tools andtechnology for very deep sub-micron (VDSM) technologies were reviewed.The flow started with the definition of specification, and ended with physicallayout. The significance was placed on logic and physical synthesis relatedtopics.

Also introduced was a new concept of physical synthesis as applicable to thedesign flow to shorten the design cycle of the chip. The need to performphysical synthesis was emphasized to get a better estimation of delays andshorten the time-to-market.

Page 41: Advanced ASIC Chip Synthesis - Bhatnagar

2

TUTORIALSynthesis and Static Timing Analysis

This chapter is intended both for beginners and advanced users of Synopsystools. Novices with no prior experience in synthesis using Synopsys tools areadvised to skip this chapter and return to it after reading rest of the book.Beginners with minimal experience in synthesis may use this chapter as ajump-start to learn the ASIC design process, using Synopsys tools. Advancedusers will benefit by using this chapter as a reference.

The chapter offers minimal or no explanation for Synopsys commands (theyare explained in subsequent chapters). The emphasis is on outlining thepractical aspects of the ASIC design flow described in Chapter 1, withSynopsys synthesis in the center. This helps the reader correlate thetheoretical concepts with its practical application.

In order to describe both the traditional and the physical compiler basedflows, all scripts related to the former are maintained (the commands havebeen changed to the Tcl format). A separate section based on the PhysicalCompiler (or PhyC) flow has been added. This provides the users the abilityto choose whichever flow best suits their design needs.

Page 42: Advanced ASIC Chip Synthesis - Bhatnagar

20 Chapter 2

Although, the previous chapter stressed skipping the gate-level simulation infavor of formal verification techniques, many designers are reluctant toforego the former step. Due to this reason, this chapter also covers the SDFgeneration from DC, to be used for simulation purposes. Also, the chapterincludes static timing analysis using PrimeTime (PT), in addition toapplication of formal verification methods, using Formality.

Synthesis and optimization may be performed using any number ofapproaches. This solely depends upon the methodology you prefer, or aremost comfortable using. This chapter uses one such approach that is mostcommonly used by the Synopsys user’s community. You may cater thisapproach to suit your individual requirements with relative ease.

For the sake of clarity and ease of explanation, the bottom-up compilemethodology (described later) is used in all examples and scripts, relating tothe synthesis process presented in this chapter. Also, it must be noted that theentire ASIC flow is extremely iterative and one should not assume that theprocess described in this chapter is suitable for all designs. Later chaptersdiscuss each topic in detail that can be tailored to your designs andmethodology.

2.1 Example Design

The best way to start this topic is to go through the whole process on anexample design. A tap controller design, coded in Verilog HDL andconsisting of one level of hierarchy as shown below is chosen for thispurpose:

tap_controller.vtap_bypass.vtap_instruction.vtap_state.v

The top level of the design is called tap_controller which instantiates threemodules called tap_bypass, tap_instruction and tap_state. This designcontains a single 30 MHz clock called “tck” and a reset called “trst”. Timingspecifications for this design dictate that the setup-time needed for all input

Page 43: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 21

signals with respect to “tck” is 10ns, while the hold-time is 0ns. Furthermore,all output signals must be delayed by 10ns with respect to the clock.

The process technology targeted for this design is 0.25 micron. In order toachieve greater accuracy due of variance in process, two Synopsys standardcell technology libraries, characterized for worst-case and the best-caseprocess parameters are used. The libraries are called ex25_worst.db andex25_best.db, with a corresponding symbol library containing schematicrepresentations, called ex25.sdb. The name of the operating conditionsdefined in the ex25_worst.db library is WORST, while the name of theoperating conditions in the ex25_best.db library is BEST.

It is assumed that the functionality of the design has been verified bydynamically simulating it at the RTL level.

2.2 Initial Setup

The next step is to synthesize the design, i.e., map the design to the gatesbelonging to the specified technology library. Before we begin synthesis,several setup files must be created as follows:

a)b)

.synopsys_dc.setup file for DC & PhyC.

.synopsys_pt. setup file for PT.

The first file is the setup file for DC & PhyC and is used for logic synthesisas well as physical synthesis, while the second file is associated with PT anddefines the required setup to be used for static timing analysis.

Create both of these files with the following contents, assuming that thelibraries are kept in the directory – /usr/golden/library/std_cells/

Page 44: Advanced ASIC Chip Synthesis - Bhatnagar

22 Chapter 2

DC & PhyC .synopsys_dc.setup file

set search_path [list . /usr/golden/library/std_cells]set target_library [list ex25_worst.db]set link_library [list {*} ex25_worst.db ex25_best.db]set symbol_library [list ex25.sdb]set physical_library [list ex25_worst.pdb]

define_name_rules BORG –allowed {A-Za-z0-9_} \–first_restricted “_” –last_restricted “_” \–max_length 30 \–map {{“*cell*”, “mycell”}, {“*–return”, “myreturn”}}

set bus_naming_style %s[%d]set verilogout_no_tri trueset verilogout_show_unconnected_pins trueset test_default_scan_style multiplexed_flip_flop

PT .synopsys_pt.setup file

set search_path [list . /usr/golden/library/std_cells]set link_library [list {*} ex25_worst.db ex25_best.db]

2.3 Traditional Flow

The following steps outline the traditional flow. Here DC is used for logicsynthesis while the layout tool handles the rest of the back-end that includesplacement and routing.

2.3.1 Pre-Layout Steps

The following sub-sections illustrate the steps involved during the pre-layoutphase. This includes one-pass logic synthesis with scan insertion, statictiming analysis, SDF generation to perform functional gate-level simulation,and finally formal verification between the source RTL and synthesizednetlist.

Page 45: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 23

2.3.1.1 SynthesisThe pre-layout logic synthesis involves optimizing the design for maximumsetup-time, utilizing the statistical wire-load models and the worst-caseoperating conditions from the ex25_worst.db technology library. In order tomaximize the setup-time, you may constrain the design by defining clockuncertainty for the setup-time. In general, a 10% over-constrain is usuallysufficient, in order to minimize the synthesis-layout iterations.

After initial synthesis if gross hold-time violations are detected, they shouldbe fixed at the pre-layout level. This also helps in reducing the synthesis-layout iterations. However, it is preferable to fix minor hold-time violationsafter the layout, with real delays back annotated.

In this tutorial, we assume that minor hold-time violations exist, thereforethese violations will be fixed during the post-layout optimization. Fixinghold-time violations involves back annotation of the extracted delays fromthe layout to DC. In addition, hold-time fixes require usage of the best-caseoperating conditions from the ex25_best.db library.

Generic synthesis script for sub-modules

set active_design tap_bypass

analyze –format verilog $active_design.velaborate $active_design

current_design $active_designlink

uniquify

set_wire_load_model –name SMALLset_wire_load_mode topset_operating_conditions WORST

create_clock –period 33 –waveform [list 0 16.5] tckset_clock_latency 2.0 [get_clocks tck]

Page 46: Advanced ASIC Chip Synthesis - Bhatnagar

24 Chapter 2

set_clock_uncertainty –setup 3.0 [get_clocks tck]set_clock_transition 0.1 [get_clocks tck]set_dont_touch_network [list tck trst]

set_driving_cell –cell BUFF1X –pin Z [all_inputs]set_drive 0 [list tck trst]

set_input_delay 20.0 –clock tck –max [all_inputs]set_output_delay 10.0 –clock tck –max [all_outputs]

set_max_area 0

set_fix_multiple_port_nets –buffer_constants –allcompile –scan

check_test

remove_unconnected_ports [find –hierarchy cell {“*”}]

change_names –h –rules BORG

set_dont_touch current_design

write–hierarchy –output $active_design.dbwrite–format verilog –hierarchy \

–output $active_design.sv

The above script contains a user-defined variable called active_design thatdefines the name of the module to be synthesized. This variable is usedthroughout the script, thus making the rest of the script generic. By re-defining the value of active_design to other sub-modules (tap_instructionand tap_state), the same script may be used to synthesize the sub-modules.Users can apply the same concept to clock names, clock periods, etc. in orderto parameterize the scripts.

Lets assume that you have successfully synthesized three sub-blocks, namelytap_bypass, tap_instruction and tap_state. The one–pass synthesis was donewith no scan chain stitching. Only the flops were directly mapped to scan

Page 47: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 25

flops. We can apply the same synthesis script to synthesize the top level, withthe exception that we have to include the mapped "db" files for the sub-blocks, before reading the tap_controller. v file. In addition, this time we haveto perform scan insertion also in order to stitch the scan chains. Also, thewire-load mode may need to be changed to enclosed for proper modeling ofthe interconnect wires. Since the sub-modules contain the dont_touchattribute, the top-level synthesis will not optimize across boundaries, andmay violate the design rule constraints. To remove these violations, you mustre-synthesize/optimize the design with the dont_touch attribute removedfrom the sub-blocks.

DFT scan insertion at the top-level is another reason for removing thedont_touch attribute from the sub-blocks. This is due to the fact that theDFT scan insertion cannot be implemented at the top-level, if the sub-blockscontain the dont_touch attribute. The following script exemplifies thisprocess by performing initial synthesis with scan enabled, before re-compiling (compile –only_design_rule) the design with dont_touchattribute removed from all the sub-blocks.

Synthesis Script for the top-level

set active_design tap_controller

set sub_modules {tap_bypass tap_instruction tap_state}

foreach module $sub_modules {set syn_db $module.dbread_db syn_db

}

analyze –format verilog $active_design.velaborate $active_design

current–design $active_designlink

uniquify

Page 48: Advanced ASIC Chip Synthesis - Bhatnagar

26 Chapter 2

set_wire_load_model –name LARGEset_wire_load_mode enclosedset_operating_conditions WORST

create_clock–period 33 –waveform [list 0 16.5] tckset_clock_latency 2.0 [get_clocks tck]set_clock_uncertainty –setup 3.0 [get_clocks tck]set_clock_transition 0.1 [get_clocks tck]set_dont_touch_network [list tck trst]

set_driving_cell –cell BUFF1X –pinZ [all_inputs]set_drive 0 [list tck trst]

set_input_delay 20.0–clock tck–max [all_inputs]set_output_delay 10.0–clock tck–max [all_outputs]

set_max_area 0

set_fix_multiple_port_nets –all –buffer_constants

compile –scan

remove_attribute [find –hierarchy design {“*”}] dont_touch

current_design $active_designuniquify

check_testcreate_test_patterns –sample 10preview_scaninsert_scancheck_test

compile –only_design_rule

remove_unconnected_ports [find –hierarchy cell {“*”}]change_names–hierarchy–rules BORG

Page 49: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 27

set_dont_touch current_design

write –hierarchy –output $active_design.dbwrite–format verilog –hierarchy \

–output $active_design.sv

2.3.1.2 Static Timing Analysis using PrimeTimeAfter successful synthesis, the netlist obtained must be analyzed to check fortiming violations. The timing violations may consist of either setup and/orhold-time violations.

The design was synthesized with emphasis on maximizing the setup-time,therefore you may encounter very few setup-time violations, if any.However, the hold-time violations will generally occur at this stage. This isdue to the data arriving too fast at the input of sequential cells (data changingits value before being latched by the sequential cells), thereby violating thehold-time requirements.

If the design is failing setup-time requirements, then you have no otheroption but to re-synthesize the design, targeting the violating path for furtheroptimization. This may involve grouping the violating paths or over-constraining the entire sub-block, which had violations. However, if thedesign is failing hold-time requirements, you may either fix these violationsat the pre-layout level, or may postpone this step until after layout. Manydesigners prefer the latter approach for minor hold-time violations (also usedhere), since the pre-layout synthesis and timing analysis uses the statisticalwire-load models and fixing the hold-time violations at the pre-layout levelmay result in setup-time violations for the same path, after layout. It must benoted that gross hold-time violations should be fixed at the pre-layout level,in order to minimize the number of hold-time fixes, which may result afterthe layout.

Page 50: Advanced ASIC Chip Synthesis - Bhatnagar

28 Chapter 2

PT script for pre-layout setup-time analysis

set active_design tap_controller

read_db –netlist_only $active_design.db

current_design $active_design

set_wire_load_model –name largeset_wire_load_mode top

set_operating_conditions WORST

set_load 50.0 [all_outputs]set_driving_cell –cell BUFF1X –pinZ [all_inputs]

create_clock –period 33 –waveform [0 16.5] tckset_clock_latency 2.0 [get_clocks tck]set_clock_transition 0.2 [get_clocks tck]set_clock_uncertainty 3.0 –setup [get_clocks tck]

set_input_delay 20.0 –clock tck [all_inputs]set_output_delay 10.0 –clock tck [all_outputs]

report_constraint –all_violators

report_timing –to [all_registers –data_pins]report_timing –to [all_outputs]

write_sdf–contextverilog –output $active_design.sdf

The above PT script performs the static timing analysis for the tap_controllerdesign. Notice that the clock latency and transition are fixed in the aboveexample, because at the pre-layout level the clock tree has not been inserted.Therefore, it is necessary to define a certain amount of delay thatapproximates the final delay associated with the clock tree. Also, the clocktransition is specified because of the high fanout associated with the clock

Page 51: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 29

network. The high fanout suggests that the clock network is driving manyflip-flops, each having a certain amount of pin capacitance. This gives rise toslow input ramp time for the clock. The fixed transition value (againapproximating the final clock tree number) of clock prevents PT fromcalculating incorrect delay values, that are based upon the slow input ramp tothe flops.

The script to perform the hold-time analysis at the pre-layout level is shownbelow. To check for hold-time violations, the analysis must be performedutilizing the best-case operating conditions, specified in the ex25_best.dblibrary. In addition, an extra argument (–delay_type min) is specified in thereport_timing command, as follows:

PT script for pre-layout hold-time analysis

set active_design tap_controller

read_db –netlist_only $active_design.db

current_design $active_design

set_wire_load largeset_wire_load_mode top

set_operating_conditions BEST

set_load 50.0 [all_outputs]set_driving_cell –cell BUFF1X –pin Z [all_inputs]

create_clock –period 33 –waveform [0 16.5] tckset_clock_latency 2.0 [get_clocks tck]set_clock_transition 0.2 [get_clocks tck]set_clock_uncertainty 0.2 –hold [get_clocks tck]

set_input_delay 0.0–clock tck [all_inputs]set_output_delay 0.0–clock tck [all_outputs]

Page 52: Advanced ASIC Chip Synthesis - Bhatnagar

30 Chapter 2

report_constraint –all_violators

report_timing –to [all_registers –data_pins] \–delay_type min

report_timing –to [all_outputs] –delay_type min

write_sdf -context verilog –output $active_design.sdf

2.3.1.3 SDF GenerationTo perform timing simulation, you will need the SDF file for backannotation. The static timing was performed using PT; therefore it is prudentthat the SDF file be generated from PT itself as shown in the previous scripts.However, some designers feel comfortable in using DC to generate the SDFfile. We will therefore use DC to generate the SDF in this section.

In addition, depending on the design, the resultant SDF file may require acertain amount of “massaging” before it can be used to perform timingsimulation of the design. The reason for massaging is explained in detail inChapter 11.

The following script may be used to generate the pre-layout SDF for thetap_controller design. This SDF file is targeted for simulating the designdynamically with timing. In addition, the script also generates the timingconstraints file. Though this file is also in SDF format, it is solely used forforward annotating the timing information to the layout tool in order toperform timing driven layout using the traditional approach.

DC script for pre-layout SDF generation

set active_design tap_controller

read_db $active_design.db

current_design $active_designlink

Page 53: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 31

set_wire_load_model LARGEset_wire_load_mode topset_operating_conditions WORST

create_clock –period 33–waveform [list 0 16.5] tckset_clock_latency 2.0 [get_clocks tck]set_clock_transition 0.2 [get_clocks tck]set_clock_uncertainty 3.0 –setup [get_clocks tck]

set_driving_cell -cell BUFF1X –pin Z [all_inputs]set_drive 0 [list tck trst]

set_load 50 [all_outputs]

set_input_delay 20.0–clock tck–max [all_inputs]set_output_delay 10.0–clock tck–max [all_outputs]

write_sdf –output $active_design.sdf

write_constraints –format sdf –cover_design \–output constraints.sdf

2.3.1.4 Floorplanning and RoutingThe floorplanning step involves physical placement of cells and clock treesynthesis. Both these steps are performed within the layout tool. Theplacement step may include timing driven placement of the cells, which isperformed by annotating the constraints.sdf file (generated by DC) to thelayout tool. This file consists of path delays that include the cell-to-celltiming information. This information is used by the layout tool to place cellswith timing as the main criterion i.e., the layout tool will place timing criticalcells closer to each other in order to minimize the path delay.

Let’s assume that the design has been floorplanned. Also, the clock tree hasbeen inserted in the design by the layout tool. The clock tree insertionmodifies the existing structure of the design. In other words, the netlist in thelayout tool is different from the original netlist present in DC. This is becauseof the fact that the design present in the layout tool contains the clock tree,

Page 54: Advanced ASIC Chip Synthesis - Bhatnagar

32 Chapter 2

whereas the original design in DC does not contain this information.Therefore, the clock tree information should somehow be transferred to thedesign residing in DC or PT. The new netlist (containing the clock treeinformation) should be formally verified against the original netlist to ensurethat the transfer of clock tree did not break the functionality of the originallogic. Various methods of transferring the clock tree information to thedesign are explored in detail in Chapter 9. For the sake of simplicity, let usassume that the clock tree information is present in the tap_controller design.

The design is now ready for routing. In a broad sense, routing is performed intwo phases – global route and detailed route. During global route, the routerdivides the layout surface into separate regions and performs a point-to-point“loose” routing without actually placing the geometric wires. The finalrouting is performed by the detailed router, which physically places thegeometric wires and routes within the regions. Full explanations of thesetypes of routing are explained in Chapter 9. Lets assume that the design hasbeen global routed.

The next step involves extracting the estimated parasitic capacitances, andRC delays from the global routed design. This step reduces the synthesis-layout iteration time, especially since cell placement and global routing maytake much less time than detailed routing the entire chip. However, if thecells are placed optimally with minimal congestion, detailed routing is alsovery fast. In any case, extraction of delays after the global route phase (albeitestimates) provides a faster method of getting closer to the real delay valuesthat are extracted from the layout database after the detailed routing phase.

Back annotate the estimates to the design in PT for setup and hold-time statictiming analysis, using the following scripts.

PT script for setup-time analysis, using estimated delays

set active_design tap_controller

read_db –netlist_only $active_design.db

current_design $active_design

Page 55: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 33

set_operating_conditions WORST

set_load 50.0 [all_outputs]set_driving_cell –cell BUFF1X –pin Z [all_inputs]

source capacitance.pt # estimated parasitic capacitancesread_sdf rc_delays.sdf # estimated RC delays

create_clock –period 33 –waveform [0 16.5] tckset_propagated_clock [get_clocks tck]set_clock_uncertainty 0.5 –setup [get_clocks tck]

set_input_delay 20.0–clock tck [all_inputs]set_output_delay 10.0–clock tck [all_outputs]

report_Constraint –all_violators

report_timing –to [all_registers –data_pins]report_timing –to [all_outputs]

PT script for hold-time analysis, using estimated delays

set active_design tap_controller

read_db –netlist_only $active_design.db

current_design $active_design

set_operating_conditions BEST

set_load 20.0 [all_outputs]set_driving_cell –cell BUFF1X –pin Z [all_inputs]

source capacitance.pt # estimated parasitic capacitancesread_sdf rc_delays.sdf # estimated RC delays

create_clock –period 33 –waveform [0 16.5] tck

Page 56: Advanced ASIC Chip Synthesis - Bhatnagar

34 Chapter 2

set_propagated_clock [get_clocks tck]set_clock_uncertainty 0.05 –hold [get_clocks tck]

set_input_delay 0.0–clock tck [all_inputs]set_output_delay 0.0–clock tck [all_outputs]

report_constraint –all_violators

report_timing –to [all_registers –data_pins] \–delay_type min

report_timing –to [all_outputs] –delay_type min

The above script back annotates capacitance.pt and rc_delays.sdf file. Thecapacitance.pt file contains the capacitive loading per net of the design inset_load format, while the rc_delays.sdf file contains point-to-pointinterconnect RC delays of individual nets. DC (and PT) performs thecalculation of cell delay, based upon the output net loading and input slope ofeach cell in the design. The reason for using this approach is explained indetail in Chapter 9.

If the design fails setup-time requirements, you may re-synthesize the designwith adjusted constraints or re-floorplan the design. If the design is failinghold-time requirements, then depending on the degree of violation you maydecide to proceed to the final step of detailed routing the design, or re-optimize the design with adjusted constraints.

If re-synthesis is desired then the floorplan (placement) informationconsisting of the physical clusters and cell locations, should be backannotated to DC. This step is desired because up till now, DC did not knowthe physical placement information of cells. By annotating the placementinformation to DC, the post-layout optimization of the design within DC isvastly improved. The layout tool generates the physical information in PDEFformat that can be read by DC, using the following command:

read_clusters <file name in PDEF format>

Page 57: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 35

The script to perform this is similar to the initial synthesis script, with theexception of the annotated data and incremental compilation of the design, asillustrated below:

Script for incremental synthesis of the design

set active_design tap_controller

read_db $active_design.db

current_design $active_designlink

source capacitance.dc /* estimated parasitic capacitances */read_timing –f sdf rc_delays.sdf /* estimated RC delays */read_clusters clusters.pdef /* physical information */

create_wire_load –hierarchy \–percentile 80 \–output cwlm.txt

create_clock –period 33 –waveform [0 16.5] tckset_propagated_clock [get_clocks tck]set_clock_transition 0.2 [get_clocks tck]set_clock_uncertainty 3.0 –setup [get_clocks tck]

set_dont_touch_network [list tck trst]

set_driving_cell –cell BUFF1X –pin Z [list all_inputs]set_drive 0 [list tck trst]

set_input_delay 20.0–clock tck –max [all_inputs]set_output_delay 10.0–clock tck –max [all_outputs]

set_max_area 0

set_fix_multiple_port_nets –all –buffer_constants

Page 58: Advanced ASIC Chip Synthesis - Bhatnagar

36 Chapter 2

reoptimize_design –in_place

write –hierarchy –output $active_design.dbwrite –format verilog –hierarchy \

–output $active_design.sv

The create_wire_load command used in the above script creates a customwire-load model for the tap_controller design. The initial synthesis run usedthe wire-load models present the in the technology library that are not designspecific. Therefore, in order to achieve better accuracy for the next synthesisiteration, the custom wire-load models specific to the design should be used.

The following command may be used to update the technology librarypresent in DC’s memory to reflect the new custom wire-load models. Forexample:

dc_shell-t> update_lib ex25_worst.db cwlm.txt

Let’s assume that the design has been re-analyzed and is now passing bothsetup and hold-time requirements. The next step is to detail route the design.This is a layout dependent feature, therefore will not be discussed here.

2.3.2 Post-Layout Steps

The post-layout steps involve, verifying the design for timing with actualdelays back annotated; functional simulation of the design; and lastly,performing LVS and DRC.

Let us just presume that the design has been fully routed with minimalcongestion and area. The finished layout surface must then be extracted toget the actual parasitic capacitances and interconnect RC delays. Dependingupon the layout tool and the type of extraction, the extracted values aregenerally written out in the SDF format for the interconnect RC delays, whilethe parasitic information is generated as a string of set_load commands foreach net in the design. In addition, if a hierarchical place and route has beenperformed, the physical placement location of cells in the PDEF formatshould also be generated.

Page 59: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 37

2.3.2.1 Post-Layout Static Timing Analysis using PrimeTimeThe first step after layout is to perform static timing on the design, using theactual delays. Similar to post-placement, the post-route timing analysis usesthe same commands, except that this time the actual delays are backannotated to the design.

Predominantly, the timing of the design is dependent upon clock latency andskew. It is therefore prudent to perform the clock skew analysis beforeattempting to analyze the whole design. A useful Tcl script is provided bySynopsys through their on-line support on the web, called SolvNET. Youmay download this script and run the analysis before proceeding. Let usassume that the clock latency and skew is within limits. The next step is toperform the static timing on the design, to check the setup and hold-timeviolations (if any) using the following scripts:

PT script for setup-time analysis, using actual delays

set active_design tap_controller

read_db –netlist_only $active_design.db

current_design $active_design

set_operating_conditions WORST

set_load 50.0 [all_outputs]set_driving_cell –cell BUFF1X –pin Z [all_inputs]

source capacitance.pt # actual parasitic capacitancesread_sdf rc_delays.sdf # actual RC delaysread_parasitics clock_info_wrst.spf # for clocks etc.

create_clock –period 33 –waveform [0 16.5] tckset_propagated_clock [get_clocks tck]set_clock_uncertainty 0.5 –setup [get_clocks tck]

Page 60: Advanced ASIC Chip Synthesis - Bhatnagar

38 Chapter 2

set_input_delay 20.0–clock tck [all_inputs]set_output_delay 10.0–clock tck [all_outputs]

report_constraint –all_violators

report_timing –to [all_registers –data_pins]report_timing –to [all_outputs]

PT script for hold-time analysis, using actual delays

set active_design tap_controller

read_db –netlist_only $active_design.db

current_design $active_design

set_operating_conditions BEST

source capacitance.pt # actual parasitic capacitancesread_sdf rc_delays.sdf # actual RC delaysread_parasitics clock_info_best.spf # for clocks etc.

set_load 50.0 [all_outputs]set_driving_cell –cell BUFF1X –pin Z [all_inputs]

create_clock –period 33 –waveform [0 16.5] tckset_propagated_clock [get_clocks tck]set_clock_uncertainty 0.05 –hold [get_clocks tck]

set_input_delay 0.0–clock tck [all_inputs]set_output_delay 0.0–clock tck [all_outputs]

report_constraint –all_violatorsreport_timing –to [all_registers –data_pins] \

–delay_type minreport_timing –to [all_outputs] –delay_type min

Page 61: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 39

2.3.2.2 Post-Layout OptimizationThe post-layout optimization or PLO may be performed on the design toimprove or fix the timing requirements. DC provides several methods offixing timing violations, through the in place optimization (or IPO) feature.As before, DC also makes use of the physical placement information toperform location based optimization (LBO). In this example, we will use thecell resizing and buffer insertion feature of the IPO to fix the hold-timeviolations.

2.3.2.2.1 Hold-Time FixesThe design was synthesized for maximum setup-time requirements. Timingwas verified at each step (after synthesis and then, after the global routephase), therefore in all probability the routed design will pass the setup-timerequirements. However, some parts of the design may fail hold-timerequirements at various endpoints.

If the design fails the hold-time requirements then you should fix theviolations by adding buffers to delay the arrival time of the failing signals,with respect to the clock. Let’s assume that the design is failing hold-timerequirements at multiple endpoints.

There are various approaches to fix the hold-time violations. Such methodsare discussed in detail in Chapter 9. In this example, we will utilize thedc_shell-t commands to fix the hold time violations, as illustrated below:

DC Script to fix the hold-time violations

set active_design tap_controller

read_db $active_design.db

current_design $active_designlink

source capacitance.dc /*actual parasitic capacitances */

Page 62: Advanced ASIC Chip Synthesis - Bhatnagar

40 Chapter 2

read_timing –f sdf rc_delays.sdf /*actual RC delays */read_clusters clusters.pdef /*physical hierarchy info */

create_clock –period 33 –waveform {0 16.5} tckset_propagated_clock [get_clocks tck]set_clock_unceitainty –hold 0.05 tck

set_dont_touch_network [list tck trst

set_driving_cell –cell BUFF1X –pin Z [all_inputs]set_drive 0 [list tck trst]

set_input_delay _min 0.0–clock tck–max [all_inputs]set_output_delay –min 0.0–clock tck–max [all_outputs]

set_fix_hold tck /* fix hold-time violations w.r.t. tck */

reoptimize_design –in_place

write –hierarchy –output $active_design.dbwrite–format verilog –hierarchy \

–output $active_design.sv

In the above script, the set_fix_hold command instructs DC to fix hold-timeviolations with respect to the clock tck. The –in_place argument of thereoptimize_design command is the IPO command, which is regulated byvarious variables that are described in Chapter 9. Making use of thesevariables, DC inserts or resizes the gates to fix the hold time violations. TheLBO variables are helpful in inserting the buffers at the correct location, soas to minimize its impact on some other logic path, leading off from theviolating path.

After IPO, the design should again be analyzed through PT to ensure that theviolations have been fixed using the post-layout PT script illustrated before.

Once the design passes all timing requirements, the post-layout SDF may begenerated (from PT or DC) for simulation purposes, if needed. We will useDC to generate the worst-case post-layout SDF using the script provided

Page 63: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 41

below. A similar script may be used to generate the best-case SDF.Obviously, you need to back annotate the best-case extracted numbers fromthe layout tool to generate the best-case SDF from DC. This solely dependson the layout tool and the methodology being used.

DC script for worst-case post-layout SDF generation

set active_design tap_controller

read_db $active_design.db

current_design $active_designlink

set_operating_conditions WORST

source capacitance.dc /* actual parasitic capacitances */read_timing rc_delays.sdf /* actual RC delays */

create_clock–period 33 –waveform {0 16.5} tckset_propagated_clock [get_clocks tck]set_clock_uncertainty –setup 0.5 [get_clocks tck]

set_driving_cell –cell BUFF1X –pin Z [all_inputs]set_drive 0 [list tck trst]

set_load 50 [all_outputs]

set_jnput_delay 20.0–clock tck–max [all_inputs]set_output_delay 10.0–clock tck–max [all_outputs]

write_sdf –output $active_design.sdf

It is recommended that formal verification be performed again between thesource RTL and the final netlist, to check for any errors that may have beenunintentionally introduced during the whole process. This is the final step;the design is now ready for LVS and DRC checks, before tape-out.

Page 64: Advanced ASIC Chip Synthesis - Bhatnagar

42 Chapter 2

2.4 Physical Compiler Flow

The Physical Compiler (or PhyC) flow provides an integrated approach tosynthesis and placement combined. It does not utilize the traditionalapproach of using the wire-load models, thus minimizing the discrepancybetween the pre–layout synthesis and post–layout delays.

Due to the importance of this technique and its novel nature, a completechapter has been devoted to this flow. Instead of duplicating the scripts here,all the scripts related to this flow have been illustrated in that chapter.Readers are advised to read Chapter10 to learn about PhyC flow in order tounderstand how PhyC is used and where it fits within the synthesis andlayout approach we have been so used to.

2.5 Chapter Summary

This chapter highlighted the practical side of the ASIC design methodologyin the form of a tutorial. An example design was used to guide the readerfrom start to finish. At each stage, brief explanation and relevant scripts wereprovided.

The chapter started with basics of setting up the Synopsys environment andtechnical specification of the example design. Further sections were dividedinto pre-layout, floorplanning and routing, and finally the post-layout steps.

The pre-layout steps included initial synthesis and scan insertion of thedesign, along with static timing analysis, and SDF generation for dynamicsimulation. In order to minimize the synthesis-layout iterations, thefloorplanning and routing section stressed upon the placement of cells, withemphasis on back annotating to DC, the estimated delays extracted afterglobal routing the design. The final section used post-layout optimizationtechniques to fix the hold-time violations, and to generate the final SDF forsimulation.

The application of formal verification method using Synopsys Formality wasalso included. This section did not contain any scripts, but the reader was

Page 65: Advanced ASIC Chip Synthesis - Bhatnagar

TUTORIAL 43

made aware of the usefulness of formal techniques and where they areapplied.

Finally the Physical Synthesis approach was discussed that eliminates theneed for wire-load models and integrates the synthesis with placement. Allscripts related to this approach are provided in Chapter 10.

Page 66: Advanced ASIC Chip Synthesis - Bhatnagar

3

BASIC CONCEPTS

This chapter covers the basic concepts related to synthesis, using Synopsyssuite of tools. These concepts introduce the reader to synthesis terminologyused throughout the later chapters. These terms provide the necessaryframework for Synopsys synthesis and static timing analysis.

Although this chapter is a good reference, advanced users of Synopsys tools,already familiar with Synopsys terminology, may skip this chapter.

3.1 Synopsys Products

This section briefly describes all relevant Synopsys products related to thisbook.a)b)c)d)e)f)

Library CompilerDesign Compiler and Design VisionPhysical CompilerPrimeTimeDFT CompilerFormality

Page 67: Advanced ASIC Chip Synthesis - Bhatnagar

46 Chapter 3

Library CompilerThe core of any ASIC design is the technology library containing a set oflogic cells. The library may contain functional description, timing, area andother pertinent information of each cell. Library Compiler (LC) parses thistextual information for completeness and correctness, before converting it toa format, used globally by all Synopsys applications.

Library Compiler is invoked by typing lc_shell in a UNIX shell. All thecapabilities of the LC can also be utilized within dc_shell.

Design Compiler and Design VisionThe Synopsys Design Compiler (DC) and Design Vision (DV) comprise apowerful suite of logic synthesis products, designed to provide an optimalgate-level synthesized netlist based on the design specifications, and timingconstraints. In addition to high-level synthesis capabilities, it alsoincorporates a static timing analysis engine, along with solutions for FPGAsynthesis and links-to-layout (LTL).

Design Compiler is the command line interface of Synopsys synthesis tooland is invoked by either typing dc_shell or dc_shell-t in a UNIX shell. Thedc_shell is the original format that is based on Synopsys’s own languagewhile dc_shell-t uses the standard Tcl language. This book focuses only onthe Tc1 version of DC because of the commonality with other Synopsys tools,like PrimeTime.

The Design Vision is the graphical front-end version of DC and is launchedby typing design_vision. Design Vision also supports schematic generation,with critical path analysis through point-to-point highlighting.

Although, beginners may initially prefer using DV, they quickly migrate tousing DC, as they become more familiar with Synopsys commands.

Physical CompilerPhysical Compiler (or PhyC) is a new tool by Synopsys that is a superset ofDC. In addition to incorporating all the synthesis and optimizationcapabilities of DC, it also provides the ability to concurrently place cellsoptimally, based on the timing and/or area constraints of the design.

Page 68: Advanced ASIC Chip Synthesis - Bhatnagar

BASIC CONCEPTS 47

PhyC is invoked by typing psyn_shell. A separate GUI version is alsoavailable, which is launched by typing psyn_gui. Although slow bycomparison, psyn_gui provides the users the ability to traverse between thelogical and the schematic view of the design.

It must be noted that PhyC being the superset of DC, all dc_shell commandsare available within psyn_shell. The reverse is not true. You cannot usepsyn_shell commands within dc_shell.

PrimeTimePrimeTime (PT) is the Synopsys sign-off quality, full chip, gate-level statictiming analysis tool. In addition, it also allows for comprehensive modelingcapabilities, often required by large designs.

PT is faster compared to DC's internal static liming analysis engine. It alsoprovides enhanced analysis capabilities, both textually and graphically. Incontrast to the rest of Synopsys tools, this tool is Tcl language based,therefore providing powerful features of that language to promote theanalysis and debugging of the design.

PT is a stand-alone tool and can be invoked as a command line interface orgraphically. To use the command line interface, type pt_shell in the UNIXwindow, or type primetime for the graphical version.

DFT CompilerThe DFT Compiler (DFTC) is the Synopsys test insertion tool that isincorporated within the DC suite of tools. The DFTC is used to insert DFTfeatures like scan insertion and boundary scan, to the design. All DFTCcommands are directly invoked from dc_shell or psyn_shell.

FormalityFormality is the Synopsys formal verification or more precisely a logicequivalence checking tool. The tool features enhanced graphical debuggingcapabilities that include schematic representation of logic under verification,and visual suggestions annotated to the schematic as pointers of possibleincorrect logic. It also provides suggestions for possible fixes to the design.

Page 69: Advanced ASIC Chip Synthesis - Bhatnagar

48 Chapter 3

3.2 Synthesis Environment

As with most EDA products, Synopsys tools require a setup file that specifiesthe technology library location and other parameters used for synthesis.Synopsys also defines its own format for storing and processing theinformation. This section highlights such details.

3.2.1 Startup Files

There is a common startup file called “.synopsys_dc.setup” for all tools ofthe DC and PhyC family. A separate startup file is required for PT which isnamed “.synopsys_pt.setup”. These files are in Tcl format, and contain pathinformation to the technology libraries and other environment variables.

The default startup files for PhyC, DC and PT reside in the Synopsysinstallation directory, and are automatically loaded upon invocation of thesetools. These default files do not contain the design dependent data. Theirfunction is to load the Synopsys technology independent libraries and otherparameters. The user in the startup files specifies the design dependent data.During startup, these tools read the files in the following order:

1.2.3.

Synopsys installation directory.Users home directory.Project working directory.

The settings specified in the startup files, residing in the project workingdirectory, override the ones specified in the home directory and so forth, i.e.,the configuration specified in the project working directory takes precedenceover all other settings.

It is up to the discretion of the user to keep these files wherever it isconvenient. However, it is recommended that the design dependent startupfiles be kept in the working directory.

The minimum information required by DC is the search_path,target_library, link_library and the symbol_library. PhyC requires thephysical_library information in addition to the DC related setup. PT requires

Page 70: Advanced ASIC Chip Synthesis - Bhatnagar

BASIC CONCEPTS 49

the search_path and link_library information only. Typical startup files areshown in Example 3.1

Example 3.1: Setup Files

PhyC & DC .synopsys_dc.setup file

set search_path [list. /usr/golden/library/std_cells \/usr/golden/library/pads]

set target_library [list std_cells_lib.db]set physical_library [list std_cells_lib.pdb pad_lib.pdb]set link_library [list {*} std_cells_lib.db pad_lib.db]set symbol_library [list std_cells_lib.sdb pad_lib.sdb]

PT .synopsys_pt.setup file

set search_path [list. /usr/golden/library/std_cells \/usr/golden/library/pads]

set link_path [list {*} std_cells_lib.db pad_lib.db]

3.2.2 System Library Variables

At this time, it is worth explaining the difference between the target_libraryand the link_library system variables. The target_library specifies the nameof the technology library that corresponds to the library whose cells thedesigners want DC to infer and finally map to. The link_library defines thename of the library that refers to the library of cells used solely for reference,i.e., cells in the link_library are not inferred by DC. For example, you mayspecify a standard cell technology library as the target_library, whilespecifying the pad technology library name and all other macros (RAMs,ROMs etc.) in the link_library list. This means that the user would synthesizethe design that targets the cells present in the standard cell library, whilelinking to the pads and macros that are instantiated in the design. If the pad

Page 71: Advanced ASIC Chip Synthesis - Bhatnagar

50 Chapter 3

library is included in the target_library list, then DC may use the pads tosynthesize the core logic.

The target library name should also be included in the link_library list, asshown in Example 3.1. This is important while reading the gate-level netlistin DC. DC will not be able to link to the mapped cells in the netlist, if thetarget library name is not included in the link library list. For this case, DCgenerates a warning stating that it was unable to resolve reference for thecells present in the netlist.

The target_library and link_library system variables allow the designer tobetter control the mapping of cells. These variables also provide a usefulmeans to re-map a gate-level netlist from one technology to the other. In thiscase, the link_library may contain the old technology library name, while thetarget_library may contain the new technology library. Re-mapping can beperformed by using the translate command in dc_shell.

The symbol_library system variable holds the name of the library,containing graphical representation of the cells in the technology library. It isused, to represent the gates schematically, while using the graphical front-end tool, DA. The symbol libraries are identified with a “sdb” extension. Ifthis variable is omitted from the setup file, DA will use a generic symbollibrary called “generic.sdb” to create schematics. Generally, all technologylibraries provided by the library vendor include a corresponding symbollibrary. It is imperative that there be a exact match of the cell names and thepin names, between the technology and the symbol library. Any mismatch ina cell will cause DA to reject the cell from the symbol library, and use thecell from the generic library.

In addition to specifying logical libraries, physical libraries also need to beidentified if using psyn_shell. These libraries contain the physicalinformation (like physical dimensions of cells, orientation, layer informationetc.) needed by PhyC. The physical libraries are identified with a “pdb”extension and are referenced by the physical_library system variable asshown in the above example.

It must be noted that DC uses the link_library variable, whereas PT calls itthe link_path. Apart from the difference in name and the format, the

Page 72: Advanced ASIC Chip Synthesis - Bhatnagar

BASIC CONCEPTS 51

application of both these variables is identical. Since PT is a gate-level statictiming analyzer, it only works on the structural gate-level netlists. Thus, PTdoes not utilize the target_library variable.

3.3 Objects, Variables and Attributes

Synopsys supports a number of objects, variables and attributes in order tostreamline the synthesis process. Using these, designers can write powerfulscripts to automate the synthesis process. It is therefore essential fordesigners to familiarize themselves with these terms.

3.3.1 Design Objects

There are eight different types of design objects categorized by DC. Theseare:

Design: It corresponds to the circuit description that performs somelogical function. The design may be stand-alone or may include othersub-designs. Although, sub-designs may be part of the design, it is treatedas another design by Synopsys.

Cell: It is the instantiated name of the sub-design in the design. InSynopsys terminology, there is no differentiation between the cell andinstance; both are treated as cell.

Reference: This is the definition of the original design to which the cell orinstance refers. For example, a leaf cell in the netlist must be referencedfrom the link library, which contains the functional description of the cell.Similarly, a sub-design instantiated (called cell by Synopsys) must bereferenced in the design, which contains functional description of theinstantiated sub-design.

Port: These are the primary inputs, outputs or IO’s of the design.

Page 73: Advanced ASIC Chip Synthesis - Bhatnagar

52 Chapter 3

Pin: It corresponds to the inputs, outputs or IO’s of the cells in the design(Note the difference between port and pin)

Net: These are the signal names, i.e., the wires that hook up the designtogether by connecting ports to pins and/or pins to each other.

Clock: The port or pin that is identified as a clock source. Theidentification may be internal to the library or it may be done usingdc_shell-t commands.

Library: Corresponds to the collection of technology specific cells thatthe design is targeting for synthesis; or linking for reference.

3.3.2 Variables

Variables are placeholders used by DC for the purpose of storinginformation. The information may relate to instructions for tailoring the finalnetlist, or it may contain user-defined value to be used for automating thesynthesis process. Some variables are pre-defined by DC and may be used bythe designer to obtain the current value stored in the variable. For example,the variable called “bus_naming_style” has a special meaning to DC, while“captain_picard” has no meaning to DC. The latter may be used to hold anyuser-defined value for scripting purposes.

All variables are global and last only during the session. They are not savedalong with the design database. Upon completion of the dc_shell-t session,the value of the variables is lost. Most dc_shell-t variables have a defaultvalue associated with them, which they inherit at the start of the session. Forinstance, the following variable uses “SYNOPSYS_UNCONNECTED_” asits default value. You may change it to “MY_DANGLE_” and DC will writeout the verilog netlist with the prefix of all unconnected nets as“MY_DANGLE_”

dc_shell-t> set verilogout_unconnected_prefix “MY_DANGLE_”

A list of all DC variables may be obtained by using the following DCcommand:

dc_shell-t> printvar *

Page 74: Advanced ASIC Chip Synthesis - Bhatnagar

BASIC CONCEPTS 53

For a particular type of variable (say test related) use the following and DCwill return all variables which have the “test” string in them.

dc_shell-t> printvar *test*

3.3.3 Attributes

Attributes are similar in nature to variables. Both store information.However, attributes store information on a particular design object such asnets, cells or clocks.

Generally, attributes are pre-defined and have special meaning to DC, thoughdesigners may set their own attributes if desired. For example, theset_dont_touch is a pre-defined attribute, used to set a dont_touch on adesign, thereby disabling DC optimization on that design.

Attributes are set on and retrieved from the design object by using thefollowing commands:

set_attribute <object list><attribute name><attribute value>

get_attribute <object list><attribute name>

For example, one may use the following command to find themax_transition value set in the library (called “STD_LIB”) and the areaattribute set for an inverter (INV2X) in that library:

dc_shell-t> get_attribute STD_LIB default_max_transition

dc_shell-t> get_attribute STD_LIB/INV2X area

Attributes may be removed from DC by using the following dc_shell-tcommand:

remove_attribute <attribute name>

Page 75: Advanced ASIC Chip Synthesis - Bhatnagar

54 Chapter 3

3.4 Finding Design Objects

One of the most useful commands provided by DC & PT is the get_*commands. Sometimes, it becomes necessary to locate objects in dc_shell-tfor the purpose of scripting or automating the synthesis process. The get_*commands are used to locate a list of designs or library objects in DC.Several types of get_* commands are provided. Examples are:

get_ports, get_nets, get_designs, get_lib_cells, get_cells, get_clocksetc.

A full list of get commands may be found by typing “ help get_* ” in thedc_shell-t command line.

Some examples of the get_* command are:

dc_shell-t> set_dont_touch [get_designs blockA]

Applies the dont_touch attribute on the design called blockA.

dc_shell-t> remove_attribute [get_designs *] dont_touch

Removes the dont_touch attribute from the whole design.

dc_shell-t> get_lib_cells stdcells_lib/*

Lists all library cells of the library called stdcells_lib.

dc_shell-t> get_pins stdcells_lib/DFFl/*

Lists all the pins of DFF1 cell present in the library called stdcells_lib.

dc_shell-t> set_dont_touch_network [get_ports [list clk scan_en] ]

Applies the dont_touch_network attribute on the specified ports.

Administrator
ferret
Page 76: Advanced ASIC Chip Synthesis - Bhatnagar

BASIC CONCEPTS 55

3.5 Synopsys Formats

Most Synopsys products support and share, a common internal structure,called the “db” format. The db files are the binary compiled formsrepresenting the text data, be it the RTL code, the mapped gate-level designs,or the Synopsys library itself. The db files may also contain any constraintsthat have been applied to the design.

In addition, all Synopsys tools understand the following formats of HDL. DCis capable of reading or writing any of these formats.

1.2.3.

VerilogVHDLEDIF

Today, Verilog and VHDL are the two main HDLs in use, for coding adesign. EDIF (Electronic Design Interchange Format) is primarily utilizedfor porting the gate level netlist, from one tool to another. EDIF was apopular choice a few years back. However, recently Verilog has gainedpopularity and dominance prompted by its simple to read format anddescription. Most of the EDA tools today, support both Verilog and EDIF.

VHDL in general is not used for porting the netlist from one vendor tool toanother, since it requires the use of IEEE packages, which may vary betweendifferent tools. This language is essentially used for the purpose of coding thedesign and system level verification.

3.6 Data Organization

It is a good practice to organize files according to their formats. Thisfacilitates automating the synthesis process. A common practice is toorganize them using the following file extensions.

Script files: <filename>.scrRTL Verilog file: <filename>.vSynthesized Verilog netlist: <filename>.svRTL VHDL file: <filename>.vhd

Page 77: Advanced ASIC Chip Synthesis - Bhatnagar

56 Chapter 3

Synthesized VHDL netlist: <filename>.svhdEDIF file: <filename>.edfSynopsys database file: <filename>.dbReports: <filename>.rptLog files: <filename>.log

3.7 Design Entry

Before synthesis, the design must be entered in DC in the RTL format(although other formats also exist). DC provides the following two methodsof design entry:

a)b)

“read” command“analyze/elaborate” command

Synopsys initially introduced the read command, which was then followedby the analyze/elaborate commands. The latter commands for design entryprovide a fast and powerful method over the read command and arerecommended for RTL design entry.

The analyze and elaborate commands are two different commands,allowing designers to initially analyze the design for syntax errors and RTLtranslation before building the generic logic for the design. The generic logicor GTECH components are part of the Synopsys generic technologyindependent library. They are unmapped representations of boolean functionsand serve as place holders for the technology dependent library.

The analyze command also stores the result of the translation in thespecified design library (UNIX directory) that may be used later. Forexample, a design analyzed previously may not need re-analysis and canmerely be elaborated, thus saving time. Conversely, the read commandperforms the function of both analyze and elaborate commands but doesnot store the analyzed results, therefore making the process slow incomparison.

Parameterized designs (such as usage of generic statement in VHDL) mustuse analyze and elaborate commands in order to pass required parameters,

Page 78: Advanced ASIC Chip Synthesis - Bhatnagar

BASIC CONCEPTS 57

while elaborating the design. The read command should be used for enteringpre-compiled designs or netlists in DC.

The following table lists major differences between the read andanalyze/elaborate commands for various categories:

In contrast to DC, PT uses different commands for design entry. PT, being astatic timing analyzer, only works on the mapped structural netlists. Thedesign entry commands used by PT are described in Chapter 12.

3.8 Compiler Directives

Sometimes it is necessary to control the synthesis process from the HDLsource itself. This control is primarily needed because of differences thatmay exist between the synthesis and the simulation environments. Othertimes, the control is needed simply to direct DC to map to certain types ofcomponents; or for embedding the constraints and attributes directly in theHDL source code.

DC provides a number of compiler directives targeted specifically forVerilog and VHDL design entry formats. These directives provide the meansto control the outcome of synthesis, directly from the HDL source code. The

Page 79: Advanced ASIC Chip Synthesis - Bhatnagar

58 Chapter 3

directives are specified as “comments” in the HDL code, but have specificmeaning for DC. These special comments alter the synthesis process, buthave no effect on the simulation.

The following sub-sections describe some of the most commonly useddirectives, both for Verilog and VHDL formats. For a complete list ofdirectives, users are advised to refer to the Design Compiler ReferenceManual.

3.8.1 HDL Compiler Directives

The HDL compiler directives refer to the translation process of RTL inVerilog format to the internal format used by Design Compiler. As statedabove, specific aspects of the translation are controlled by “comments”within the Verilog source code. At the beginning of each directive is theregular Verilog comment // or /* followed by the keyword “synopsys” (allin lower case). Generally, users prefer the former style to specify HDLcompiler directives. Therefore, to keep it simple, only // style of commentsfor HDL directives are discussed in this section.

All comments beginning with // synopsys are assumed to only containHDL compiler directives. DC displays an error if anything apart from HDLcompiler directives (say other comments or parts of Verilog code) are presentafter the // synopsys statement.

3.8.1.1 translate_off and translate_on Directives

These are some of the most useful and frequently used directives. Theyprovide the means to instruct DC to stop translation of the Verilog sourcecode from the start of “// synopsys translate_off”, and start the translationagain after it reaches the next directive, “// synopsys translate_on”. Thesedirectives must be used in pairs, with the translate_off directive taking thelead.

Consider a scenario where parts of the code present in the source RTL ismeant solely for the purpose of dynamic simulation (or maybe the test bench

Page 80: Advanced ASIC Chip Synthesis - Bhatnagar

BASIC CONCEPTS 59

is structured to make use of these statements). Example 3.2 illustrates such ascenario, which contains the Verilog `ifdef statement to facilitate settingparameters at the command line during simulation. Such a code is clearly un-synthesizable, since the VENDOR_ID depends on the mode specified duringsimulation. Furthermore, since the HDL compiler cannot handle thisstatement, it issues an error stating that the design could not be read due to“Undefined macro ‘ifdef .....”

Example 3.2

`ifdef MY_COMPANY`define VENDOR_ID 16’h0083

`else`define VENDOR_ID 16’h0036

`endif

The translate_off and translate_on HDL directives may be used in thiscase to bypass the “simulation only” parts of the verilog code as illustrated inExample 3.3. The resulting logic will contain the VENDOR_ID valuespertaining to MY_COMPANY only. To change it to the other value, the userhas to edit the code and move the HDL directives to make the otherVENDOR_ID value visible.

Example 3.3

// synopsys translate_off`ifdef MY_COMPANY

// synopsys translate_on

`define VENDOR_ID 16’h0083

// synopsys translate_off`else

`define VENDOR_ID 16’h0036`endif

// synopsys translate_on

Page 81: Advanced ASIC Chip Synthesis - Bhatnagar

60 Chapter 3

3.8.2 VHDL Compiler Directives

Similar to the HDL compiler, the VHDL compiler directives are specialVHDL comments that affect the actions of the VHDL compiler. All VHDLcompiler directives start with the VHDL comment (--), followed either bysynopsys or pragma statements. This provides a special meaning to thecompiler and compels it to perform specified task.

3.8.2.1 translate_off and translate_on Directives

These directives work in the similar fashion as the ones described previouslyfor the HDL compiler, with the exception that these require the VHDLcomments as follows:

-- synopsys translate_off-- synopsys translate_on-- pragma translate_off-- pragma translate_on

The VHDL compiler ignores any RTL code between the translate_off/ondirectives, however it does perform a syntax check on the embedded code. Inorder to refrain the compiler from conducting syntax checks; the code mustbe made completely transparent. This can be achieved by setting thefollowing variable to true:

hdlin_translate_off_skip_text = true

These directives are primarily used to block simulation specific constructs inthe VHDL code. For example, the user may have a library statement presentin the netlist, which specifies the name of the library that contains the VITALmodels of the gates present in the netlist. This means that for the purpose ofsimulation, the gates present in the netlist are being referenced from thislibrary. Upon reading the VHDL code, DC produces an error, since thelibrary statement is specific to simulation only. To circumvent this problem,

Page 82: Advanced ASIC Chip Synthesis - Bhatnagar

BASIC CONCEPTS 61

one may envelop the library statement with the above directives to force DCto completely ignore the library statement.

3.8.2.2 synthesis_off and synthesis_on Directives

The synthesis_off/on directives work in a manner similar to thetranslate_on/off directives. The behavior of synthesis_off/on directivesitself, is not affected by the value of the hdlin_translate_off_skip_textvariable. However, the translate_off/on directives perform exactly the samefunction if the value of the variable specified above is set to false.

The above directives are the preferred approach to hide the simulation onlyconstructs. Though, the VHDL compiler performs the syntax checks of thecode present within these directives, it ignores the code for the purpose ofsynthesis.

Syntactically, these variables may be used as follows:

-- pragma synthesis_off<VHDL code goes here, used only for simulation>

-- pragma synthesis_on

3.9 Chapter Summary

This chapter introduced the reader to various terminology and concepts usedby Synopsys.

Starting from a brief description and purpose of some of the tools providedby Synopsys, the chapter covered the Synopsys environment that includedexamples of startup files needed for PhyC, DC and PT, followed by theconcepts of Objects, Variables and Attributes.

A brief introduction was also provided for the find command and itsusefulness. Different Synopsys formats were discussed along with Designentry methods. The advantages and disadvantages of using read versusanalyze/elaborate command were also covered.

Page 83: Advanced ASIC Chip Synthesis - Bhatnagar

62 Chapter 3

Finally, the chapter concluded by describing some of the most usefuldirectives used by DC for the purpose of hiding simulation only constructs.

Throughout the chapter, various examples were provided to facilitate the userin understanding these concepts.

Page 84: Advanced ASIC Chip Synthesis - Bhatnagar

4

SYNOPSYS TECHNOLOGY LIBRARY

Synopsys technology library format has almost become the de-facto librarystandard. Its compact yet informative format allows adequate representationof the deep sub-micron technologies. The popularity of the Synopsys libraryformat is evident from the fact that most place and route tools provide adirect translation of the Synopsys libraries, with almost a one-to-onemapping between the timing models in Synopsys libraries, and the place androute timing models. A basic understanding of the library format and delaycalculation methods is the key for successful synthesis.

Designers usually do not concern themselves with full details of thetechnology library as long as the library contains a variety of cells, each withdifferent drive strengths. However, in order to optimize the designsuccessfully, it is essential for designers to have a clear understanding of thedelay calculation method used by DC along with the wire-load modeling andcell descriptions. It is therefore, the intent of this chapter to describe theSynopsys technology library from the designer’s perspective, rather thandiscussing details about the structural and functional syntax of the library.

Page 85: Advanced ASIC Chip Synthesis - Bhatnagar

64 Chapter 4

4.1 Technology Libraries

The Synopsys technology libraries can be separated in two broad classes:

1.2.

Logic libraryPhysical library

4.1.1 Logic Library

The logic library contains information relevant only to the synthesis processand is used by DC for synthesis and optimization of the design. Thisinformation may include pin-to-pin timing, area, pin types and power alongwith other necessary data needed by DC. No physical information is presentin the logic library.

The logic library is a text file (usually with extension “.lib”), which iscompiled using the Library Compiler (LC) to generate a binary format with“.db” extension.

4.1.2 Physical Library

The physical library contains the physical characteristics of the cell alongwith other necessary information relevant to Physical Compiler. Suchinformation may contain data relating to the physical dimensions of cells,layer information, orientation of cells etc. For each logical cell, acorresponding physical cell should also be present.

The physical library is also a text file (usually with extension “.plib”) and iscompiled by LC to generate a binary format with a “.pdb” extension.Synopsys have provided a useful utility called “lef2pdb” that takes thestandard LEF (Library Exchange Format) file and the process technology file(also in LEF format) as input and converts it to the “pdb” format. The formerfile contains physical information about each cell in the design, whereas theprocess technology file contains information specific to a process such asnumber of layers, pitch, resistance, capacitance etc.

Page 86: Advanced ASIC Chip Synthesis - Bhatnagar

SYNOPSYS TECHNOLOGY LIBRARY 65

Following is the usage of this command.

lc_shell> Ief2pdb –t tech.lef –l standard_cells.lef

Full explanation of the physical library and its syntax is beyond the scope ofthis book. Readers are advised to refer to the Physical Library ReferenceManual for details.

4.2 Logic Library Basics

The logic library contains the following information:

a)b)c)d)

Library groupLibrary level attributesEnvironment descriptionCell description

4.2.1 Library Group

The library group statement specifies the name of the library, followed by anopen brace. The closing brace is the last entry in the library file. Anythingbetween the open and closing brace, is part of the entire library groupdescription.

library (ex25) { /* start of library */

< library description >

}/* end of library*/

It is recommended that the file name and the technology library name be thesame.

Page 87: Advanced ASIC Chip Synthesis - Bhatnagar

66 Chapter 4

4.2.2 Library Level Attributes

The library level attributes are statements that apply to library as a whole.These generally contain library features such as technology type, date,revision, and default values that apply to the entire library.

library (ex25) {technology (cmos);delay_model : table_lookup;date : “Feb 29,2000”;revision : “1.0”;current_unit :“1A”;time_unit : “1 ns”;voltage_unit :“1V”;pulling_resistance_unit : “1kohm”;capacitive_load_unit (1.0, pf);default_inout_pin_cap : 1.5 ;default_input_pin_cap : 1.0 ;default_output_pin_cap : 0.0 ;default_max_fanout : 10.0 ;default_max_transition : 3.0;default_operating_conditions: NOMINALin_place_swap_mode : match_footprint;

}

4.2.3 Environment Description

Environment attributes are defined in the library to model the variations oftemperature, voltage and manufacturing processes. These consist of scalingfactors (derating), timing range models and operation conditions. In addition,the environment description also contains wire-load models that are used byDC to estimate interconnect wiring delays.

Page 88: Advanced ASIC Chip Synthesis - Bhatnagar

SYNOPSYS TECHNOLOGY LIBRARY 67

4.2.3.1 Scaling Factors

The scaling factors or K-factors are multipliers that provide means forderating the delay values based on the variations in process, voltage andtemperature, or simply PVT. Only some of the K-factor statements are shownbelow as an example. Please refer to the library compiler reference manualfor full details.

k_process_fall_transition : 1.0 ;k_process_rise_transition : 1.2 ;k_process_fall_propagation : 0.4 ;k_process_rise_propagation : 0.4 ;k_temp_fall_transition : 0.03 ;k_temp_rise_transition : 0.04 ;k_temp_fall_propagation : 1.2 ;k_temp_rise_propagation : 1.24;k_volt_fall_transition : 0.02 ;k_volt_rise_transition : 0.5 ;k_volt_fall_propagation : 0.9 ;k_volt_rise_propagation : 0.85 ;

4.2.3.2 Operating Conditions

Sets of operating conditions defined in the library specify the process,temperature, voltage and the RC tree model. These are used during synthesisand timing analysis of the design. A library is characterized using one set ofoperating conditions. During synthesis or timing analysis, if another set ofoperating conditions is specified, then DC uses the K-factors to derate thedelay values based upon the specified operating conditions. Librarydevelopers may define any number of operating conditions in the library.Typically the following operating conditions are defined in the technologylibrary:

operating_conditions (WORST) {process : 1.3 ;temperature: 100.0 ;voltage: 2.75 ;

Page 89: Advanced ASIC Chip Synthesis - Bhatnagar

68 Chapter 4

tree_type : worst_case_tree ;}operating_conditions (NOMINAL) {

process : 1.0 ;temperature: 25.0 ;voltage : 3.00 ;tree_type : balanced_tree ;

}operating_conditions (BEST) {

process : 0.7 ;temperature: 0.0 ;voltage: 3.25 ;tree_type : best_case_tree ;

}

The process, temperature and voltage attributes have already been explainedpreviously. The tree_type attribute defines the environmental interconnectmodel to be used. DC uses the value of this attribute to select the appropriateformula while calculating interconnect delays. The worst_case_treeattribute models the extreme case when the load pin is at the most distant endof a net, from the driver. In this case the load pin incurs the full netcapacitance and resistance. The balanced_tree model uses the case whereall load pins are on separate but equal interconnect wires from the driver. Theload pin in this case, incurs an equal portion of net capacitance andresistance. The best_case_tree models the case where the load pin is sittingright next to the driver. The load pin incurs only the net capacitance, withoutany net resistance.

4.2.3.3 Timing Range Models

The timing_range models provide additional capability of computing arrivaltimes of signals, based upon the specified operating conditions. Thiscapability is provided by Synopsys to accommodate the fluctuations inoperating conditions for which the design has been optimized. DC uses thetiming ranges to evaluate the arrival times of the signals during timinganalysis.

Page 90: Advanced ASIC Chip Synthesis - Bhatnagar

SYNOPSYS TECHNOLOGY LIBRARY 69

timing_range (BEST) {faster_factor : 0.5 ;slower_factor : 0.6 ;

}timing_range (WORST) {

faster_factor : 1.2 ;slower_factor : 1.3 ;

}

4.2.3.4 Wire-Load Models

The wire_load group contains information that DC utilizes to estimateinterconnect wiring delays during the pre-layout phase of the design.Usually, several models appropriate to different sizes of the logic areincluded in the technology library. These models define the capacitance,resistance and area factors. In addition, the wire_load group also specifiesslope and fanout_length for the logic under consideration.

The capacitance, resistance and area factors represent the wire resistance,capacitance and area respectively, per unit length of interconnect wire. Thefanout_length attribute specifies values for the length of the wire associatedwith the number of fanouts. Along with fanout and length, this attribute mayalso contain values for other parameters, such as average_capacitance,standard_deviation and number_of_nets. These attributes and their valuesare written out automatically, when generating wire-load models throughDC. For manual creation, only the values for fanout and length are needed,using the fanout_length attribute. For nets exceeding the longest lengthspecified in the fanout_length attribute, the slope value is used to linearlyinterpolate the existing fanout_length value, in order to determine its value.

wire_load (SMALL) {resistance : 0.2 ;capacitance : 1.0 ;area : 0 ;slope : 0.5 ;fanout_length( 1, 0.020) ;fanout_length( 2, 0.042) ;fanout_length( 3, 0.064) ;

Page 91: Advanced ASIC Chip Synthesis - Bhatnagar

70 Chapter 4

fanout_length( 4, 0.087) ;

fanout_length(1000,20.0) ;}wire_load(MEDIUM) {

resistance : 0.2 ;capacitance : 1.0 ;area : 0 ;slope : 1.0 ;fanout_length( 1, 0.022) ;fanout_length( 2, 0.046) ;fanout_length( 3, 0.070) ;fanout_length( 4, 0.095) ;

fanout_length(1000,30.0) ;}wire_load(LARGE) {

resistance : 0.2 ;capacitance : 1.0 ;area : 0 ;slope : 1.5 ;fanout_length( 1, 0.025 ) ;fanout_length( 2, 0.053 ) ;fanout_length( 3, 0.080 ) ;fanout_length( 4, 0.110 ) ;

fanout_length( 1000, 40.0 ) ;}

In addition to the wire_load groups, other attributes are defined in the libraryto automatically select the appropriate wire_load group, based on the totalcell area of the logic under consideration.

wire_load_selection(AUTO_WL) {wire_load_from_area ( 0, 5000, “SMALL” ) ;wire_load_from_area ( 5000, 10000, “MEDIUM” ) ;wire_load_from_area ( 10000, 15000, “LARGE” ) ;

}

Page 92: Advanced ASIC Chip Synthesis - Bhatnagar

SYNOPSYS TECHNOLOGY LIBRARY 71

default_wire_load_selection : AUTO_WL ;default_wire_load_mod e : enclosed ;

It is recommended that the value of the default_wire_load_mode be set to“enclosed” or “segmented” instead of “top”. The wire load modes andtheir application are described in detail in Chapter 6.

4.2.4 Cell Description

Each cell in the library contains a variety of attributes describing thefunction, timing and other information related to each cell. Rather than goinginto detail and describing all the attributes possible, only the relevantattributes and related information useful to designers are shown in theexample below:

cell (BUFFD0) {area : 5.0 ;pin (Z) {max_capacitance : 2.2 ;max_fanout : 4.0 ;function : “I” ;direction : output ;timing () {

}timing () {

}related_pin : “I” ;

}pin (I) {

direction : input ;capacitance: 0.04 ;fanout_load : 2.0 ;max_transition : 1.5 ;

}}

Page 93: Advanced ASIC Chip Synthesis - Bhatnagar

72 Chapter 4

The area attribute defines the cell area as a floating-point number withoutany units followed by pin description and their related timing.

In addition, several design rule checking (DRC) attributes may be associatedwith each pin of the cell. These are:

fanout_load attribute for input pins.max_fanout attribute for output pins.max_transition attribute for input or output pins.max_capacitance attribute for output or inout pins.

The DRC conditions are based on the vendor’s process technology andshould not be violated. The DRC attributes define the conditions in which thecells of the library operate safely. In other words, cells are characterizedunder certain conditions (output loading, input slope etc.). Designs violatingthese conditions may have a severe impact on the normal operation of thecells, thereby causing the fabricated chip to fail.

Even though, the previous example contains all four attributes, generallyonly two are used. In most cases, either the fanout_load along withmax_fanout, or max_transition with max_capacitance are used.

The fanout_load and max_fanout DRC attributes are related to each other,in such that the max_fanout value at the output of the driver pin cannotexceed the sum of all fanout_load values at each input pin of the drivencells. Consider the cell (BUFFD0) shown in the previous example. This cellcontains a max_fanout value of 4.0 associated to the output pin Z, while thefanout_load value at its input is 2.0. This cell therefore, cannot drive morethan 2 of its own kind (BUFFD0) cells, since

max_fanout (4) = fanout_load (2) of cell + fanout_load (2) of cell

If the DRC violations occur, then DC replaces the driving cell with anotherthat has a higher max_fanout value.

The max_transition attribute is generally applied to the input pin, whereasthe max_capacitance is applied to the output pin. Both attributes perform

Page 94: Advanced ASIC Chip Synthesis - Bhatnagar

SYNOPSYS TECHNOLOGY LIBRARY 73

the similar function as the max_fanout and fanout_load attributes. Thedifference being, that the max_transition attribute defines that any net thathas a transition time greater than the specified max_transition value of theload pin, cannot be connected to that pin. The max_capacitance at theoutput pin specifies that the output pin of the driver cell cannot connect toany net that has the total capacitance (interconnect and load pin capacitance)greater than, or equal to the maximum value defined at the output pin.

If DRC violations occur, then DC replaces the driving cell with another thathas a higher max_capacitance value.

In addition, the output pin contains attributes defining the function of the pin,and the delay values related to the input pin. The input pin defines its’ pincapacitance and the direction. The capacitance attribute should not beconfused with the max_capacitance attribute. DC uses the capacitanceattribute to perform delay calculations only, while the max_capacitance, asexplained above, is used for design rule checking.

It is also worthwhile to mention here that for sequential cells, the clock inputpin uses another attribute (clock : true) that specifies that the input pin is oftype “clock”. More details can be found in the Library Compiler ReferenceManual.

The cell’s DRC attributes are often the most criticized part of the cell library.Library developers often find it impossible to satisfy everyone and are oftenblamed for not implementing the “right” numbers for these attributes. Theproblem is caused because the library, to a certain extent is dependent uponthe coding style and chosen methodology. What works perfectly for onedesign may produce inadequate results for another design. It is therefore theintent of this section to briefly explain the solutions that designers may use totailor the library to suit their needs.

In order to accommodate the design requirements, it is possible to change thevalues of the above DRC attributes on a per cell basis. However, it must benoted that the DRC attributes set in the library can only be tightened, theycannot be loosened. This can only be done, if the attributes are pre-specifiedin the cell description. Users should realize that if these attributes are not

Page 95: Advanced ASIC Chip Synthesis - Bhatnagar

74 Chapter 4

already specified on the pin of the cell in the technology library, it is not bepossible to add these attributes on the pin, from dc_shell.

For instance, to change the max_fanout value specified for pin Z of cellBUFFD0 (of library ex25 described previously), from 4.0 to 2.0; thefollowing dc_shell–t command may be used:

dc_shell-t> set_attribute [get_pins ex25/BUFFD0/Z] max_fanout 2

One may also use wildcards in the above command to cover a variety ofcells. This is useful for cases where a global change is required. For example,users may use the following command to change the max_fanout value onall cells with 0 drive strengths in the technology library:

dc_shell-t> set_attribute [get_pins ex25/*D0/Z] max_fanout 2

Similarly, the set_attribute command may be used to alter the value of otherDRC attributes. The above command may be specified in the.synopsys_dc.setup file for global implementation.

4.3 Delay Calculation

Synopsys supports several delay models. These include the CMOS genericdelay model, CMOS piecewise linear delay model and the CMOS non-lineartable lookup model. Presently, the first two models are not in common use,due to their inefficiencies in representing the true delays caused by VDSMgeometries. The non-linear delay model is the most prevalent delay modelused in the ASIC world.

4.3.1 Delay Model

The non-linear delay model (NLDM) method uses a circuit simulator tocharacterize a cell’s transistors with a variety of input slew rates, and outputload capacitances. The results form a table, with input transition and outputload capacitance as the deciding factor for calculating the resultant cell delay.

Page 96: Advanced ASIC Chip Synthesis - Bhatnagar

SYNOPSYS TECHNOLOGY LIBRARY 75

Figure 4-1, shown below depicts the resulting delays and slew rates,interpolated to produce a non-linear delay model. The model’s accuracydepends on the precision and range, of the chosen input slew rates and loadcapacitances.

If the delay number falls within the square (table in the library), then thedelay is computed using interpolation techniques. The values of thesurrounding four points are used to determine the delay value, usingnumerical methods. The problem arises, when any of the parameters falloutside the table. DC is best designed to extrapolate the resulting delay, but

Page 97: Advanced ASIC Chip Synthesis - Bhatnagar

76 Chapter 4

often ends up with an extremely high value. This may be a blessing indisguise, since a high value is easily noticeable during static timing analysis,providing designers an opportunity to correct the situation.

4.3.2 Delay Calculation Problems

The delay calculation of a cell is performed using the input transition timeand the capacitive loading seen at the output. The input transition time of acell is evaluated based upon the transition delay of the driving cell (previouscell). If the driving cell contains more than one timing arc, then the worsttransition time is used, as input to the driven cell. This directly impacts thestatic timing analysis and the generated SDF file for a design.

Consider the logic shown in Figure 4-2. The signals, reset and signal_a areinputs to the instance U1. Let us presume that the reset signal is non criticalas compared to signal_a. The reset signal is a slow signal, therefore, thetransition time of this signal is high as compared to signal_a. This causes twotransition delays to be computed for cell U1 (2 ns from A to Z, and 0.3 nsfrom B to Z). When generating SDF, the two values will be written outseparately as part of the cell delay, for the cell U1. However, the questionnow arises, which of the two values does DC use to compute the inputtransition time for cell U2? DC uses the worst (maximum) transition value ofthe preceding gate (U1) as the input transition time for the driven gate (U2).Since the transition time of reset signal is more compared to signal_a, the2ns value will be used as input transition time for U2. This causes a largedelay value to be computed for cell U2 (shaded cell).

Page 98: Advanced ASIC Chip Synthesis - Bhatnagar

SYNOPSYS TECHNOLOGY LIBRARY 77

To avoid this problem one needs to inform DC, not to perform the delaycalculation for the timing arc – pin A to pin Z of cell Ul. This step should beperformed before writing out the SDF. The following dc_shell commandmay be used for this purpose:

dc_shell-t> set_disable_timing U1 –from A –to Z

Unfortunately, this problem also arises during static timing analysis. Failureto disable the timing computation of the false path leads to large delay valuescomputed for the driven cell.

4.4 What is a Good Library?

Cell libraries determine the overall performance of the synthesized logic. Agood cell library will result in fast design with smallest area, whereas a poorlibrary will degrade the final result.

Historically, the cell libraries were schematic based. Designers would choosethe appropriate cell and connect them manually to produce a netlist for thedesign. When the automatic synthesis engines became prevalent, the sameschematic based libraries were converted and used for synthesis. However,since the synthesis engine relies on a number of factors for optimization, thisapproach almost always resulted in poor performance of the synthesized

Page 99: Advanced ASIC Chip Synthesis - Bhatnagar

78 Chapter 4

logic. It is therefore imperative that the cell library be designed catered solelytowards the synthesis approach.

The following guidelines outline, the specific kind of cells in the technologylibrary desired by the synthesis engine.

a)

b)

c)

d)

e)

f)

g)

h)

i)

j)

k)

A variety of drive strengths for all cells.

Larger varieties of drive strengths for inverters and buffers.

Cells with balanced rise and fall delays (used for clock tree buffers andgated clocks).

Same logical function and its inversion as separate outputs, within thesame physical cell (e.g., OR gate and NOR gate, as a single cell), againwith a variety of drive strengths.

Same logical function and its inversion as separate cells (e.g., AND gateand NAND gate as two separate cells), with a variety of drive strengths.

Complex cells (e.g., AOI, OAI or NAND gate with one input invertedetc) with a variety of high drive strengths.

High fanin cells (e.g., AOI with 6 inputs and one output) with a range ofdifferent drive strengths.

Variety of flip-flops with different drive strengths, both positive andnegative-edge triggered.

Single or Multiple outputs available for each flip-flop (e.g., Q only, orQN only, or both), each with a variety of drive strengths.

Flops to contain different inputs for Set and Reset (e.g., Set only, Resetonly, no Set or Reset, both Set and Reset).

Variety of latches, both positive and negative-edge enabled each withdifferent drive strengths.

Page 100: Advanced ASIC Chip Synthesis - Bhatnagar

SYNOPSYS TECHNOLOGY LIBRARY 79

1) Several delay cells. These are useful when fixing the hold-time violations.

Using the above guideline will result in a library optimized to handle thesynthesis algorithm. This provides DC with the means to choose from avariety of cells to implement the best possible logic for the design.

It is worthwhile to note that the usage of high fanin cells, although useful inreducing the overall cell area, may cause routing congestion, which mayinadvertently cause timing degradation, and/or increase in the area of therouted design. It is therefore recommended that these cells be used withcaution.

Some designers prefer to exclude the low drive strengths for high fanin cellsfrom the technology library. This is again is based on the algorithm used bythe routing engine and the type of placement (timing driven etc.) used bydesigners. If the router is not constrained, then it uses a method by which itassociates a weight to each net of the design while placing cells. Dependingupon the weight of the net, the cells are pulled towards the source having thehighest weight. High fanin cells have a larger weight associated to its inputs(because of the number of inputs) compared to the weight associated withtheir outputs (single output). Therefore, the router will place these cells nearthe gates that are driving it. This will result in the high fanin cell being pulledaway from the cell it is supposed to be driving, causing a long net to bedriven by the high fanin cell. If the high fanin cell is not strong enough todrive this long net (large capacitance) then the result will be the computationof large cell delay for the high fanin cell, as well as the driven gate (becauseof slow input transition time). By eliminating the low drive strengths of thehigh-fanin cells from the technology library, this problem can be preventedafter layout.

4.5 Chapter Summary

To summarize, this chapter described the contents of the Synopsys logiclibrary from the designer’s perspective. The emphasis was placed upon thecorrect usage and understanding of the logic library, rather than focusing ondetails that are relevant only to library developers.

Page 101: Advanced ASIC Chip Synthesis - Bhatnagar

80 Chapter 4

A brief discussion was also provided for the physical library that is used bythe Physical Compiler. Emphasis was not placed on describing the syntaxand functionality of this library due to fact that the topic of discussion isbeyond the scope of this book.

The chapter started with basics of the logic library, with separate groupswithin the library. The relevant portions of each group were explained indetail. This included explanation of all attributes that the library uses toperform its task.

Special emphasis was given to describing the delay calculation method,along with operating conditions, wire-load modeling and cell description. Ateach step, problems associated and workarounds were explained in detail.

Finally, suggestions were provided to the user as to what constitutes a goodlibrary optimized for synthesis engine. This includes helpful hints by takinginto account the router behavior of the layout tool.

Page 102: Advanced ASIC Chip Synthesis - Bhatnagar

5

PARTITIONING AND CODING STYLES

Successful synthesis depends strongly on proper partitioning of the design,together with a good HDL coding style.

Logical partitioning is the key to successful synthesis (and place and route, iflayout is hierarchical). Traditionally, designers partitioned the design inaccordance with the functionality of each block, giving no thought to thesynthesis process. As a result of incorrect partitioning, the inflexibleboundaries degrade the synthesis results, which makes optimization difficult.Partitioning the design correctly can significantly enhance the synthesizedresult. In addition, reduced compile time and simplified script management isalso achieved.

A good coding style is imperative, not only for the synthesis process, but alsofor easy readability of the HDL code. Today, many designers only stressverifying the functionality of the design. Driven by time restriction and/orlack of communication between the team members, designers do not have theluxury of carefully scrutinizing the HDL coding style. The fact remains,however, that a good coding style not only results in reduction of chip areaand aids in top-level timing, but also produces faster logic.

Page 103: Advanced ASIC Chip Synthesis - Bhatnagar

82 Chapter 5

5.1 Partitioning for Synthesis

Partitioning can be viewed as, utilizing the “Divide and Conquer” concept toreduce complex designs into simpler and manageable blocks. Promotingdesign reuse is one of the most significant advantages to partitioning thedesign.

Apart from the ease in meeting timing constraints for a properly partitioneddesign, it is also convenient to distribute and manage different blocks of thedesign between team members.

The following recommendations achieve best synthesis results and reductionin compile time.

a)

b)

c)

d)

e)

f)

g)

h)

i)

j)

k)

Keep related combinational logic in the same module.

Partition for design reuse.

Separate modules according to their functionality.

Separate structural logic from random logic.

Limit a reasonable block size (depends on the memory capacity of themachine)

Partition the top level (separate I/O Pads, Boundary Scan and core logic).

Do not add glue-logic at the top level.

Isolate state-machine from other logic.

Avoid multiple clocks within a block.

Isolate the block that is used for synchronizing multiple clocks.

WHILE PARTITIONING, THINK OF YOUR LAYOUT STYLE.

Page 104: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 83

The group and ungroup commands provide the designer with the capabilityof altering the partitions in DC, after the design hierarchy has already beendefined by the previously written HDL code. Figure 5-1, illustrates such anaction.

The group command combines the specified instances into a separate block.In Figure 5-1, instances Ul and U2 are grouped together to form a sub-blocknamed sub1, using the following command:

dc_shell> current_design top

dc_shell> group {U1 U2} –design_name sub1

The ungroup command performs the opposite function. It is used to removethe hierarchy, as shown in Figure 5-1, by using the following command.

Page 105: Advanced ASIC Chip Synthesis - Bhatnagar

84 Chapter 5

dc_shell> current_design top

dc_shell> ungroup –all

The designer can also use the ungroup command along with the –flatten and–all options to flatten the entire hierarchy. This is illustrated below:

dc_shell> ungroup –flatten –all

5.2 What is RTL?

Today, RTL or the Register Transfer Level is the most popular form of high-level design specification. An RTL description of a design describes thedesign in terms of transformation and transfer of logic from one register toanother. Logic values are stored in registers where they are evaluated throughsome combinational logic, and then re-stored in the next register.

RTL functions like a bridge between the software and hardware. It is textwith strong graphical connotations – text that implies graphics or structure. Itcan be described as technology independent, textual structural description,similar to a netlist.

5.2.1 Software versus Hardware

A frequent obstacle to writing HDL code is the software mind-set. HDLshave evolved from logic netlist representations. HDLs in their initial form(the Register Transfer Level) were a forum to represent logic in a formatindependent from any particular technology library. A higher level of HDLabstraction is the behavioral level that allows the design to be independent oftiming and explicit sequencing.

Frequently, the expectation is that the synthesis tool will synthesize the HDLto the minimal area and maximum performance, regardless of how the HDLis written. The problem remains that at high level there are numerous ways ofwriting code to perform the same function. For example, a conditionalexpression could be written using case statements or if statements. Logically,

Page 106: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 85

these expressions are responsible for performing the same task, but whensynthesized they can give drastically different results, as far as type of logicinferred, area, and timing are concerned. A reasonable caveat told to recentadopters of synthesis is – THINK HARDWARE!

5.3 General Guidelines

The following are general guidelines that every designer should be aware of.There is no fixed rule to adhere to these guidelines, however, following themvastly improves the performance of the synthesized logic, and may produce acleaner design that is well suited for automating the synthesis process.

5.3.1 Technology Independence

HDL should be written in a technology independent fashion. Hard-codedinstances of library gates should be minimized. Preference should be given toinference rather than instantiation. The benefit being that the RTL code canbe implemented with any ASIC library and new technology through re-synthesis. This is especially important for synthesizable IP cores that arecommonly used by many designs.

In cases where placement of library gates is unavoidable, all the instantiatedgates may be grouped together to form their own module. This helps inmanagement of library specific aspects of a design.

5.3.2 Clock Related Logic

a)

b)

Clock logic including clock gating logic and reset generation should bekept in one block – to be synthesized once and not touched again. Thishelps in a clean specification of the clock constraints. Another advantageis that the modules that are being driven by the clock logic can beconstrained using ideal clock specifications.

Avoid multiple clocks per block – try keeping one clock per block. Suchrestrictions later help avoid difficulties that may arise while constraining ablock containing multiple clocks. It also helps in managing clock skew at

Page 107: Advanced ASIC Chip Synthesis - Bhatnagar

86 Chapter 5

c)

d)

the physical level. Sometimes this becomes unavoidable, for instancewhere synchronization logic is present to sync signals from one clockdomain to the other. For such cases, it is recommended that designerisolate the sync logic into a separate module for stand-alone synthesis.This includes setting a dont_touch attribute on the sync logic beforeinstantiating it in the main block.

Clocks should be given meaningful names. A suggestion is to keep thename of the clock that reflects its functionality in addition to itsfrequency. Another good practice is to keep the same name for the clock,uniform throughout the hierarchy, i.e., the clock name should not changeas it traverses through the hierarchy. This simplifies the script writing andhelps in automating the synthesis process.

For DFT scan insertion, it is a requirement that the clocks be controlledfrom primary inputs. This may involve adding a mux at the clock sourcefor controllability. Incorporate the mux logic within the module thatcontains all other clock logic. Isolating the clock logic block helps in finetuning the synthesized logic in terms of gate size and the type ofinference. If necessary, this small module can easily be hand tweaked tosuit for optimal solution.

5.3.3 No Glue Logic at the Top

The top-level should only be used for connecting modules together. It shouldnot contain any combinational glue logic. One of the benefits of this style isthat it makes redundant the very time consuming top-level compile, whichcan now be simply stitched together without undergoing additional synthesis.Absence of glue logic at the top-level also facilitates layout, if performinghierarchical place and route.

5.3.4 Module Name Same as File Name

A good practice is to keep the module name (or entity name), same as the filename. Never describe more than one module or entity in a single file. Asingle file should only contain a single module/entity definition for synthesis.

Page 108: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 87

This has enormous benefits in defining a clean methodology using scriptinglanguages like PERL, AWK etc.

5.3.5 Pads Separate from Core Logic

Divide the top-level into two separate blocks “pads” and “core”. Pads areusually instantiated and not inferred, therefore it is preferred that they be keptseparate from the core logic. This simplifies the setting of the dont_touchattribute on all the pads of the design, simultaneously. By keeping the pads ina separate block, we are isolating the technology dependent part of RTLcode.

5.3.6 Minimize Unnecessary Hierarchy

Do not create unnecessary hierarchy. Every hierarchy sets a boundary.Performance is degraded, if unnecessary hierarchies are created. This isbecause DC is unable to optimize efficiently across hierarchies. One may usethe ungroup command to flatten the unwanted hierarchies, before compilingthe design to achieve better results.

5.3.7 Register All Outputs

This is a well-known Synopsys recommendation. The outputs of a blockshould originate directly from registers. Although not always practical, thiscoding/design style simplifies constraint specification and also helpsoptimization. This style prevents combinational logic from spanning moduleboundaries. It also increases the effectiveness of the characterize-write-scriptsynthesis methodology by preventing the pin-pong effect that is common tothis type of compilation technique.

5.3.8 Guidelines for FSM Synthesis

The following guidelines are presented for writing finite state machines thatmay help in optimizing the logic:

a) State names should be described using “enumerated types” in VHDL, or“parameters” in Verilog.

Page 109: Advanced ASIC Chip Synthesis - Bhatnagar

88 Chapter 5

b)

c)

Combinational logic for computing the next state should be in its ownprocess or always block, separate from the state registers.

Implement the next-state combinational logic with a case statement.

5.4 Logic Inference

High-level Description Languages (HDLs) like VHDL and Verilog are front-ends to synthesis. HDLs allow a design to be represented in a technologyindependent fashion. However, synthesis imposes certain restrictions on themanner in which HDL description of a design is written. Not all HDLconstructs can be synthesized. Not only that, synthesis expects HDLs to becoded in a specific way so as to get the desired results. One can say thatsynthesis is template driven – if the code is written using the templates thatare understood and expected by the synthesis tool, then the results will becorrect and predictable. The templates and other coding patterns for synthesisare called coding styles. For quality results it is imperative that designerspossess a keen understanding of the coding styles, logic inferences, and thecorresponding logic structures that DC generates.

5.4.1 Incomplete Sensitivity Lists

This is one of the most common mistakes made by designers. Incompletesensitivity lists may cause simulation mismatches between the source RTLand the synthesized logic. DC issues a warning for signals that are present inthe process or always block, but are absent from the sensitivity list. This isprimarily a simulation problem since the process does not trigger whensensitized (because of the missing signal in the sensitivity list). Thesynthesized logic in most cases is generally correct for blocks containingincomplete sensitivity lists. However, it is strongly recommended thatdesigners pay special attention to the sensitivity list and complete it in orderto eliminate any surprises at the end of synthesis cycle.

Page 110: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 89

Verilog Example

always @(weekend or go_to_beach or go_to_work)beginif (weekend)

action = go_to_beachelse if (weekday)

action = go_to_work;

VHDL Example

process (weekend, go_to_beach, go_to_work)begin

if (weekend) thenaction <= go_to_beach;

elsif (weekday) thenaction <= go_to_work;

end if;end process;

The examples illustrated above do not contain the signal “weekday” in theirsensitivity lists. The synthesized logic may still be accurate, however, duringsimulation the process will not trigger each time the signal “weekday”changes value. This may cause a mismatch between the simulation result ofthe source RTL and the synthesized logic.

5.4.2 Memory Element Inference

There are two types of memory elements – latches and flip-flops. Latches arelevel-sensitive memory elements, while flip-flops in general are edge-sensitive. Latches are transparent as long as the enable to the latch is active.At the time the latch is disabled, it holds the value present at the D input, atits Q output. Flip-flops on the other hand, respond to rising or falling edge ofthe clock.

Latches are simple devices, therefore they cover less area as compared totheir counterparts, flip-flops. However, latches in general are more

Page 111: Advanced ASIC Chip Synthesis - Bhatnagar

90 Chapter 5

troublesome because their presence in a design makes DFT scan insertiondifficult, although not impossible. It is also complicated to perform statictiming analysis on designs containing latches, due to their ability of beingtransparent when enabled. For this reason, designers generally prefer flip-flops to latches.

The following sub-sections provide detailed information on how to avoidlatches, as well as how to infer them, if desired.

5.4.2.1 Latch InferenceA latch is inferred when a conditional statement is incompletely specified.An if statement with a missing else part is an example of incompletelyspecified conditional. Here is an example, both in Verilog and VHDL:

Verilog Example

always @(weekend)begin

if (weekend)action <= go_to_beach;

end

VHDL Example

process (weekend)begin

if (weekend = ‘1’) thenaction <= go_to_beach;

end process;

The above statement will cause the DC to infer a latch enabled by a signalcalled “weekend”. In the above example, “action” is not given any valuewhen the signal “weekend” is 0. Always cover all the cases in order to avoidunintentional latch inference. This may be achieved by using an elsestatement, or using a default statement outside the if branch.

Page 112: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 91

A latch may also get inferred from an incompletely specified case statementin Verilog.

`define sunny 2’b00`define snowy 2’b01`define windy 2’b10

wire [1:0] weather;

case (weather)sunny : action <= go_motorcycling;snowy : action <= go_skiing;windy : action <= go_paragliding;

endcase;

In the above case statement only 3 of the 4 possible values of “weather” arecovered. This causes a latch to be inferred on the signal “action”. Note, forthe above example the Synopsys full_case directive may also be used toavoid the latch inference as explained in Chapter 3. The following examplecontains the default statement that provides the fourth condition, therebypreventing the latch inference.

case (weather)sunny : action <= go_motorcycling;snowy : action <= go_skiing;windy : action <= go_paragliding;default : action <= go_paragliding;

endcase;

VHDL does not allow incomplete case statements. This often means that theothers clause must be used, consequently the above problem does not occurin VHDL. However, latches may still be inferred by VHDL, if a particularoutput signal is not assigned a value in each branch of the case statement.The inference being that outputs must be assigned a value in all branches toprevent latch inference in VHDL.

Page 113: Advanced ASIC Chip Synthesis - Bhatnagar

92 Chapter 5

case (weather) iswhen sunny => action <= go_motorcycling;when snowy => action <= go_skiing;when windy => action <= go_paragliding;when others=> null;

end case;

The above example, although containing the others clause will infer latchesbecause the output signal “action” is not assigned a particular value in theothers clause. To prevent this, all branches should be completely specified,as follows:

case (weather) iswhen sunny => action <= go_motorcycling;when snowy => action <= go_skiing;when windy => action <= go_paragliding;when others=> action <= go_paragliding;

end case;

5.4.2.2 Register InferenceDC provides a wide variety of templates for register inference. This is tosupport different edge-types of the clock and reset mechanisms. A register isinferred, when there is an edge specified in the sensitivity list. The edgecould be a positive edge or a negative edge.

5.4.2.2.1 Register Inference in VerilogIn Verilog, a register is inferred when an edge is specified in the sensitivitylist of an always block. One register is inferred for each of the variablesassigned in the always block. All variable assignments, not directlydependent on the clock-edge should be made in a separate always block,which does not have an edge specification in its sensitivity list.

A plain and simple positive edge-triggered D flip-flop is inferred using thefollowing template:

Page 114: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 93

always @(posedge clk)reg_out <= data;

In order to infer registers with resets, the reset signal is added to thesensitivity list, with reset logic coded within the always block. Following isan example of a D flip-flop with an asynchronous reset:

always @(posedge clk or reset)if (reset)

reg_out <=1’b0;else

reg_out <= data;

Having a synchronous reset is a simple matter of removing the “reset” signalfrom the sensitivity list. In this case, since the block responds only to theclock edge, the reset is also recognized only at the clock edge.

always @(posedge clk)if (reset)

reg_out <= 1’b0;else

reg_out <= data;

Negative edge-triggered flop may be inferred by using the followingtemplate:

always @(negedge clk)reg_out <= data;

Absence of negative edge-triggered flop in the technology library results inDC inferring a positive edge-triggered flop with an additional inverter toinvert the clock signal.

5.4.2.2.2 Register Inference in VHDLIn VHDL a register is inferred when an edge is specified in the process body.The following example illustrates the VHDL template to infer a D flip-flop:

Page 115: Advanced ASIC Chip Synthesis - Bhatnagar

94 Chapter 5

reg 1: process (clk)begin

if (clk’event and clk = ‘1’) thenreg_out <= data;

end if;end process Reg 1;

DC does not infer latches for variables declared inside functions, sincevariables declared inside functions are reassigned each time the function iscalled.

Coding style template for registers with asynchronous and synchronousresets are similar in nature to that of Verilog templates, shown in previoussection.

Negative edge-triggered flop may be inferred by using the followingtemplate:

reg 1: process (clk)begin

if (clk’event and clk = ‘0’) thenreg_out <= data;

end if;end process Reg 1;

Absence of negative edge-triggered flop in the technology library results inDC inferring a positive edge-triggered flop with an additional inverter toinvert the clock signal.

5.4.3 Multiplexer Inference

Depending upon the design requirements, the HDL may be coded in differentways to infer a variety of architectures using muxes. These may comprise ofa single mux with all inputs having the same delay to reach the output, or apriority encoder that uses a cascaded structure of muxes to prioritize theinput signals.

Page 116: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 95

The correct use of if and case statements is a complex topic that is outside thescope of this chapter. There are application notes (from Synopsys) and otherpublished materials currently available that explain the proper usage of thesestatements. It is therefore the intent of this chapter to refer the users tooutside sources for this information. Only brief discussion is provided in thefollowing sub-sections.

5.4.3.1 Use case Statements for MuxesIn general, if statements are used for latch inferences and priority encoders,while case statements are used for implementing muxes. It is recommendedto infer muxes exclusively through case statements. The if statements may beused for latch inferencing and priority encoders. They may also beeffectively used to prioritize late arriving signals. This kind of prioritizingmay be implementation dependent. It also limits reusability.

To prevent latch inference in case statements the default part of the casestatement should always be specified. For example, in case of a statemachine, the default action could be that all states covered by the defaultclause cause a jump to the “start” state. Having a default clause in the casestatement is the preferred way to write case statements, since it makes theHDL independent of the synthesis tool. Using directives like full_case andparallel_case makes the RTL code dependent on the synthesis tool. Thesedirectives should be avoided.

If the default action is to assign don’t-cares, then a difference in behaviorbetween RTL simulation and synthesized result may occur. This is because,DC may optimize the don’t-cares randomly causing the resulting logic todiffer.

5.4.3.2 if versus case Statements – A Case of PrioritiesMultiple if statements with multiple branches result in the creation of priorityencoder structure.

always @(weather or go_to_work or go_to_beach)begin

if (weather[0]) action = go_to_work;if (weather[1]) action = go_to_beach;

end

Page 117: Advanced ASIC Chip Synthesis - Bhatnagar

96 Chapter 5

In the above example, the signal “weather” is a two-bit input signal and isused to select the two inputs, “go_to_work” and “go_to_beach”, with“action” as the output. When synthesized, the cascaded mux structure of thepriority encoder is produced as shown in Figure 5-2.

If the above example used the case statement (instead of multiple ifstatements) in which all possible values of the selection index were coveredand were exclusive, then it would have resulted in a single multiplexer asshown in Figure 5-3.

Page 118: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 97

The same structure (Figure 5-3) is produced, if a single if statement is used,along with elsif statements to cover all possible branches.

5.4.4 Three-State Inference

Tri-state logic is inferred when high impedance (Z) is assigned to an output.Arbitrary use of tri-state logic is generally not recommended because of thefollowing reasons:

a)

b)

Tri-state logic reduces testability.

Tri-state logic is difficult to optimize – since it cannot be buffered. Thiscan lead to max_fanout violations and heavily loaded nets.

On the upside however, tri-state logic can provide significant savings in area.

Verilog Example

assign tri_out = enable ? tri_in : 1’bz;

Page 119: Advanced ASIC Chip Synthesis - Bhatnagar

98 Chapter 5

VHDL Example

tri_out <= tri_in when (enable = ‘1’) else ‘Z’;

5.5 Order Dependency

Both, Verilog and VHDL provide variable assignments that are orderdependent/independent. Correct usage of these produce desired result, whileincorrect usage may cause synthesized logic to behave differently than thesource RTL.

5.5.1 Blocking versus Non-Blocking Assignments in Verilog

It is important to use non-blocking statements when doing sequentialassignments like pipelining and modeling of several mutually exclusive datatransfers. Use of blocking assignments within sequential processes maycause race conditions, because the final result depends on the order in whichthe assignments are evaluated. The non-blocking assignments are orderindependent; therefore they match closely to the behavior of the hardware.

Non-blocking assignment is done using the “<=” operator, while the “=”operator is used for blocking assignments.

always @(posedge clk)begin

firstReg <= data;secondReg <= firstReg;thirdReg <= secondReg;

end

In hardware, the register updates will occur in the reverse order as shownabove. The use of non-blocking assignments causes the assignments to occurin the same manner as hardware i.e., thirdReg will get updated with the oldvalue of secondReg and the secondReg will get updated with the old value offirstReg. If blocking assignments were used in the above example, then the

Page 120: Advanced ASIC Chip Synthesis - Bhatnagar

PARTITIONING AND CODING STYLES 99

signal “data” would have propagated all the way through to the thirdRegconcurrently during simulation.

The blocking assignments should generally be used within the combinationalalways block.

5.5.2 Signals versus Variables in VHDL

Similar to Verilog, VHDL also provides order dependency through the use ofsignals and variables. The signal assignments may be equated to Verilog’snon-blocking assignments, i.e., they are order independent. The variableassignments are order sensitive and correlate to Verilog’s blockingassignments.

Variable assignments are done using the “:=” operator, whereas the “<=”operator is used for signal assignments.

The following example illustrates the usage of the signal assignments withinthe sequential process block. The resulting hardware contains three registers,with signal “data” propagating from firstReg to secondReg and then to thethirdReg. The RTL simulation will also show the same result.

process(clk)begin

if (clk’event and clk = ‘1’) thenfirstReg <= data;secondReg <= firstReg;thirdReg <= secondReg;

end if;end process;

A general recommendation is to only use signal assignments withinsequential processes and variable assignments within the combinationalprocesses.

Page 121: Advanced ASIC Chip Synthesis - Bhatnagar

100 Chapter 5

5.6 Chapter Summary

This chapter highlighted the partitioning and coding styles suited forsynthesis. Various guidelines and suggestions were provided to help the usercode the RTL correctly with proper partitions, to make effective use of thesynthesis engine.

The chapter began by suggestions on successful partitioning techniques andwhy they are necessary, followed by a short discussion on the “what is RTL”.Emphasis was given on “thinking hardware” while coding the design.

Next, general guidelines were covered that encompassed various suggestionsand techniques, though not essential for synthesis, have significant impact onsuccessful optimization. Adherence to these suggestions produce optimizeddesigns that are well suited for automating the synthesis process.

An important section was devoted to the coding styles, and numerousexamples were provided as templates to infer the correct logic. Theseincluded inference of latches, registers, multiplexers and three-state logicelements. At each step, advantages and disadvantages along with the correctusage was discussed.

The last section described the order dependency feature of both Verilog andVHDL languages. Also discussed were appropriate coding techniques to beused by utilizing the order dependency feature of both languages.

Page 122: Advanced ASIC Chip Synthesis - Bhatnagar

6

CONSTRAINING DESIGNS

This chapter discusses the process of specifying the design environment andits constraints. It describes various commonly used DC commands alongwith other helpful constraints that may be used to synthesize complex ASICdesigns.

Please note that the commands described in this chapter only contain themost frequently used options. Designers are advised to consult the DCreference manual for the entire list of options available to a particularcommand.

This chapter contains information that is useful both for the novice and theadvanced users of Synopsys tools. The chapter attempts to focus on “realworld” applications, by taking into account deviations from the idealsituation. In other words, “Not all designs or designers, follow Synopsysrecommendations”. Incorporated within the chapter are numerous helpfulideas, marked as to guide the reader in real time application for selectedcommands.

Page 123: Advanced ASIC Chip Synthesis - Bhatnagar

102 Chapter 6

6.1 Environment and Constraints

In order to obtain optimum results from DC, designers have to methodicallyconstrain their designs by describing the design environment, targetobjectives and design rules. The constraints may contain timing and/or areainformation, usually derived from design specifications. DC uses theseconstraints to perform synthesis and tries to optimize the design with the aimof meeting target objectives.

6.1.1 Design Environment

Up until now, the assumption has been that the design has been partitioned,coded and simulated. The next step is to describe the design environment.This procedure entails defining for the design, the process parameters, I/Oport attributes, and statistical wire load models. Figure 6-1 illustrates theessential DC commands used to describe the design environment.

set_min_library This is a new command, introduced in DC98 version.The command allows users to simultaneously specify the worst-case andthe best-case libraries. This may be useful during initial compiles,preventing DC from violating the setup-time violations while fixing thehold-time violations.

set_min_library <max library filename>–min_version <min library filename>

dc_shell -t> set_min_library “ex25_worst.db” \–min_version “ex25_best.db”

The above command may be used for fixing hold-time violationsduring incremental compile or for in place optimization. In this case,the user should set both minimum and maximum values for theoperating conditions.

Page 124: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 103

set_operating_conditions describes the process, voltage andtemperature conditions of the design. The Synopsys library contains thedescription of these conditions, usually described as WORST, TYPICALand BEST case. The names of operating conditions are library dependent.Users should check with their library vendor for correct setting. Bychanging the value of the operating condition command, full ranges ofprocess variations are covered. The WORST case operating condition isgenerally used during pre-layout synthesis phase, thereby optimizing thedesign for maximum setup-time. The BEST case condition is commonlyused to fix the hold-time violations. The TYPICAL case is mostlyignored, since analysis at WORST and BEST case also covers theTYPICAL case.

Page 125: Advanced ASIC Chip Synthesis - Bhatnagar

104 Chapter 6

set_operating_conditions <name of operating conditions>

dc_shell -t> set_operating_conditions WORST

It is possible to optimize the design both with the WORST and theBEST case, simultaneously. The optimization is achieved by using the–min and –max options in the above command, as illustrated below.This is very useful for fixing the design for possible hold-timeviolations.

dc_shell-t> set_operating_conditions–min BEST –max WORST

set_wire_load_model command is used to provide estimatedstatistical wire-load information to DC, which in turn, uses the wire-loadinformation to model net delays as a function of loading. Generally, anumber of wire-load models are present in the Synopsys technologylibrary, each representing a particular size block. In addition, designersmay also choose to create their own custom wire-load models toaccurately model the net loading of their blocks.

set_wire_load_model -name<wire-load model>

dc_shell -t> set_wire_load_model-name MEDIUM

set_wire_load_mode defines the three modes associated for modelingwire loads. These are top, enclosed, and segmented. Generally, onlythe first two modes are in common use. The segmented wire load modeis not prevalent, since it relies on the wire-load models that are specific tothe net segments.

The mode top defines that all nets in the hierarchy will inherit the same wire-load model as the top-level block. One may choose to use this wire-loadmodel for sub-blocks, if planning to flatten them later for layout. This modemay also be chosen, if the user is synthesizing the design using the bottom-up compile method.

Page 126: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 105

The second mode, enclosed specifies that all nets (of the sub-blocks) willinherit the wire load model of the block that completely encloses the sub-blocks. For example, if the designer is synthesizing sub-blocks B and C thatare completely enveloped by block A (which in turn is completely enclosedby the top-level), then sub-blocks B and C will inherit the wire-load modelsdefined for block A.

The last mode, segmented is used for wires crossing hierarchicalboundaries. In the above example, sub-blocks B and C will inherit the wire-load models specific to them, while the nets between sub-block B and C (but,within block A) will inherit the wire-load model specified for block A.

set_wire_load_mode < top | enclosed | segmented >

dc_shell -t> set_wire_load_mode top

It is extremely important that designers accurately model the wireloads of their design. Too optimistic or too pessimistic wire-loadmodels result in increased synthesis iterations, in an effort to achievetiming convergence after post-layout. In general, during the pre-layout phase, slightly pessimistic wire-load models are used. This isdone to provide extra timing margin that may get absorbed, by therouted design.

set_drive and set_driving_cell are used at the input ports of the block.set_drive command is used to specify the drive strength at the input port.It is typically used to model the external drive resistance to the ports ofthe block or chip. The value of 0 signifies highest drive strength and iscommonly utilized for clock ports. Conversely, set_driving_cell is usedto model the drive resistance of the driving cell to the input ports. Thiscommand takes the name of the driving cell as its argument and appliesall design rule constraints of the driving cell to the input ports of theblock.

set_drive <value> <object list>

set_driving_cell –cell <cell name>–pin <pin name> <object list>

Page 127: Advanced ASIC Chip Synthesis - Bhatnagar

106 Chapter 6

dc_shell -t> set_drive 0 {CLK RST}

dc_shell-t> set_driving_cell –cell BUFF1 –pin Z [all_inputs]

set_load sets the capacitive load in the units defined in the technologylibrary (usually pico farads, or pf), to the specified nets or ports of thedesign. It typically sets capacitive loading on output ports of the blocksduring pre-layout synthesis, and on nets, for back-annotating the extractedpost-layout capacitive information.

set_load <value> <object list>

dc_shell -t> set_load 1.5 [all_outputs]

dc_shell -t> set_load 0.3 [get_nets blockA/n1234]

Design Rule Constraints or DRCs consist of set_max_transition,set_max_fanout and set_max_capacitance commands. These rulesare generally set in the technology library and are determined by theprocess parameters. These rules should not be violated in order to achieveworking silicon. Previous releases of DC (v97.08 and before) prioritizedDRCs even at the expense of poor timing. However, the latest versionDC98, prioritizes timing requirements over DRCs.

The DRC commands can be applied to input ports, output ports or on thecurrent_design. Furthermore, if the value set in the technology library isnot adequate or is too optimistic, then these commands may also be usedat the command line, to control the buffering in the design.

set_max_transition <value> <object list>

set_max_capacitance <value> <object list>

set_max_fanout <value> <object list>

dc_shell-t> set_max_transition 0.3 current_design

Page 128: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 107

dc_shell-t> set_max_capacitance 1.5 [get_ports out1]

dc_shell-t> set_max_fanout 3.0 [all_outputs]

6.1.2 Design Constraints

Design constraints describe the goals for the design. They may consist oftiming or area constraints. Depending on how the design is constrained, DCtries to meet the set objectives. It is imperative that designers specify realisticconstraints, since unrealistic specification results in excess area, increasedpower and/or degradation in timing. The basic commands to constrain adesign are shown in Figure 6-2.

Page 129: Advanced ASIC Chip Synthesis - Bhatnagar

108 Chapter 6

create_clock command is used to define a clock object with a particularperiod and waveform. The –period option defines the clock period, whilethe –waveform option controls the duty cycle and the starting edge of theclock. This command is applied to a pin or port, object types.

Following example specifies that the port named CLK is of type “clock” thathas a period of 40 ns, with 50% duty cycle. The positive edge of the clockstarts at time, 0 ns, with the falling edge occurring at 20 ns. By changing thefalling edge value, the duty cycle of the clock may be altered.

dc_shell-t> create_clock–period 40 –waveform [list 0 20] CLK

In some cases, a block may only contain combinational logic. To definedelay constraints for this block, one can create a virtual clock and specifythe input and output delays in relation to the virtual clock. To create avirtual clock, designers may replace the port name (CLK, in the aboveexample) with the –name <virtual clock name>, in the abovecommand. Alternatively, one can use the set_max_delay orset_min_delay commands to constrain such blocks. This is explained indetail in the next section.

create_generated_clock command is used for clocks that aregenerated internal to the design. This is a very powerful command, whichuntil recently only existed in PrimeTime. This command may be used todescribe frequency divided/multiplied clocks as a function of the primaryclock.

create_generated_clock –name <clock name>–source <clock source>–divide_by <factor> | –multiply_by <factor>

set_dont_touch_network is a very useful command, usually used forclock networks and resets. This command is used to set a dont_touchproperty on a port, or on the net. Note setting this property will alsoprevent DC from buffering the net, in order to meet DRCs. In addition,

Page 130: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 109

any gate coming in contact with the “dont_touched” net will also inheritthe dont_touch attribute.

dc_shell-t> set_dont_touch_network {CLK, RST}

Suppose, you have a block that takes as input the primary clock, andgenerates secondary clocks e.g., clock divider logic. In this scenario,you should apply the set_dont_touch_network on the generatedclock output port of the block. This will help prevent DC frombuffering the clock network.

Caution should be exercised while using set_dont_touch_networkcommand. For instance, if a design that contains gated clock circuitryand the set_dont_touch_network attribute has been applied to theclock input. This will prevent DC to appropriately buffer the gatedlogic, resulting in the DRC violation for the clock signal. The samewill hold true for gated resets.

set_dont_touch is used to set a dont_touch property on thecurrent_design, cells, references or nets. This command is frequentlyused during hierarchical compilation of the blocks. Also, it can be usedfor, preventing DC from inferring certain types of cells present in thetechnology library.

dc_shell-t> set_dont_touch current_design

dc_shell-t> set_dont_touch [get_cells sub1]

dc_shell -t> set_dont_touch [get_nets gated_rst]

For example, this command may be used on the block containing sparegates. The command will then instruct DC not to disturb (or optimize)the instantiation of the spare gates block.

set_dont_use command is generally set in the .synopsys_dc.setupenvironment file. The command is instrumental in eliminating certaintypes of cells from the technology library that the user would not wantDC to infer. For instance, by using the above command, you can filter out

Page 131: Advanced ASIC Chip Synthesis - Bhatnagar

110 Chapter 6

the flip-flops in your technology library whose name start with “SDFF”or “RSFF” as illustrated below.

dc_shell -t> set_dont_use [list mylib/SDFF* mylib/RSFF*]

set_input_delay specifies the input arrival time of a signal in relationto the clock. It is used at the input ports, to specify the time it takes for thedata to be stable after the clock edge. The timing specification of thedesign usually contains this information, as the setup/hold timerequirements for input signals. Given the top-level timing specification ofthe design, this information may also be extracted for the sub-blocks ofthe design, by utilizing the top-down characterize compile method or thedesign budgeting method, explained in Chapter 7.

dc_shell-t> set_input_delay –max 23.0 –clock CLK {datain}

dc_shell-t> set_input_delay –min 0.0 –clock CLK {datain}

In Figure 6-3, the maximum input delay constraint of 23ns and theminimum input delay constraint of 0ns is specified for the signal datainwith respect to the clock signal CLK, with a 50% duty cycle and a periodof 30ns. In other words the setup-time requirement for the input signaldatain is 7ns, while the hold-time requirement is Ons.

If both –min and –max options are omitted, the same value is used forboth the maximum and minimum input delay specifications.

Page 132: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 111

set_output_delay command is used at the output port, to define thetime it takes for the data to be available before the clock edge. The timingspecification of the design usually contains this information. Given thetop-level timing specification of the design, this information may also beextracted for the sub-blocks of the design, by utilizing the top-downcharacterize compile method or the design budgeting method, explainedin Chapter 7.

dc_shell-t> set_output_delay –max 19.0 –clock CLK {dataout}

In Figure 6-4, the output delay constraint of 19ns is specified for thesignal dataout with respect to the clock signal CLK, with a 50% dutycycle and a period of 30ns. This means that the data is valid for 11ns afterthe clock edge.

Page 133: Advanced ASIC Chip Synthesis - Bhatnagar

112 Chapter 6

During the pre-layout phase, it is sometimes necessary to over-constrain selective signals by a small amount to maximize the setup-time, thereby squeezing extra timing margin in order to reduce thesynthesis-layout iterations. To achieve this, one may fool DC byspecifying the over-constrained values to the above commands.Remember that over-constraining designs by a large amount will resultin unnecessary increase in area and increased power consumption.

A negative value (e.g., –0.5) may also be used to provide extra timingmargin while fixing the hold-time violations after layout, by makinguse of the in-place optimization on the design, explained in Chapter 9.

set_clock_latency command is used to define the estimated clockinsertion delay during synthesis. This is primarily used during the pre-layout synthesis and timing analysis. The estimated delay number is anapproximation of the delay produced by the clock tree network insertion(done during the layout phase).

dc_shell -t> set_clock_latency 3.0 [get_clocks CLK]

Page 134: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 113

set_clock_uncertainty command lets the user define the clock skewinformation. Basically this is used to add a certain amount of margin tothe clock, both for setup and hold times. During the pre-layout phase onecan add more margin as compared to the post-layout phase.

dc_shell -t> set_clock_uncertainty –setup 0.5 –hold 0.25 \[get_clocks CLK]

It is strongly recommended that users specify a certain amount of marginboth for pre-layout and the post layout phased. The main reason fordoing this is to make the chip less susceptible to the process variationsthat may occur during manufacturing.

set_clock_transition for some reason does not get as much attentionas it deserves. However, this is a very useful command, used during thepre-layout synthesis, and for timing analysis. Using this command forcesDC to use the specified transition value (that is fixed) for the clock port orpin.

dc_shell -t> set_clock_transition 0.3 [get_clocks CLK]

Setting a fixed value for transition time of the clock signal in pre-layoutis essential because of a large fanout associated with the clock network.Using this command enables DC to calculate realistic delays for the logicbeing fed by the clock net based on the specified clock signal transitionvalue. This is further explained in the “clocking issues” section later inthe chapter.

set_propagated_clock is used during the post layout phase when thedesign has undergone the insertion of the clock tree network. In this case,the latency is derived using the traditional method of delay calculation.

dc_shell -t> set_propagated_clock [get_clocks CLK]

Page 135: Advanced ASIC Chip Synthesis - Bhatnagar

114 Chapter 6

6.2 Advanced Constraints

This section describes additional design constraints that go beyond thegeneral constraints covered in the previous section. These constrains consistof specifying false paths, multicycle paths, max and min delays etc. Inaddition, this section also discusses the process of grouping timing criticalpaths for extra optimization.

It must be noted however, that the use of too many timing exceptions, suchas false paths and multicycle paths causes significant impact on the runtimes.

set_false_path is used to instruct DC to ignore a particular path fortiming or optimization. Identification of false paths in a design is critical.Failure to do so, compels DC to optimize all paths in order to reduce totalnegative slack. Consequently, the critical timing paths may be adverselyaffected due to optimization of all the paths, which also includes the falsepaths.

The valid startpoint and endpoint to be used for this command are theinput ports or the clock pins of the sequential elements, and the outputports or the data pins of the sequential cells. In addition, one can furthertarget a particular path using the -through switch.

dc_shell -t> set_false_path –from in1 –through U1/Z –to out1

Use this command when the timing critical logic is failing the statictiming analysis because of the false paths.

set_multicycle_path is used to inform DC regarding the number ofclock cycles a particular path requires in order to reach its endpoint. DCautomatically assumes that all paths are single cycle paths and willunnecessarily try to optimize the multicycle segment in order to achievethe timing. This may have a direct impact on adjacent paths as well as thearea. Also, the command provides the –through option that facilitatesisolating the multicycle segment in a design.

Page 136: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 115

dc_shell-t> set_multicycle_path 2 –from U1/Z \–through U2/A \–to out1

set_max_delay defines the maximum delay required in terms of timeunits for a particular path. In general, it is used for the blocks that containcombinational logic only. However, it may also be used to constrain ablock that is driven by multiple clocks, each with a different frequency.This command has precedence over DC derived timing requirements.

For blocks, only containing combinational logic, one may either createa virtual clock and constrain the block accordingly, or use thiscommand to constrain the total delay from all inputs to all outputs, asshown below:

dc_shell -t> set_max_delay 5 –from [all_inputs] –to [all_outputs]

Although, Synopsys recommends defining only a single clock perblock, there are situations where a block may contain multiple clocks,each with a different frequency. To constrain such a block, one maydefine all the clocks in the block using the normal create_clock andset_dont_touch_network commands. However, it becomes tedious toassign input delays of signals related to individual clocks. To avoid thissituation, an alternative approach is to define the first clock (the moststringent one) using the normal approach, while constraining otherclocks through the set_max_delay command, as shown below.

dc_shell -t> set_max_delay 0 –from CK2 \–to [all_registers –clock_pins]

The value of 0 signifies that a zero delay value is desired, between theinput port CK2, and the input clock pins of all the flops within theblock. In addition, one may also need to apply theset_dont_touch_network for other clocks. This method is suitablefor designs containing gated clocks or resets.

set_min_delay is the opposite of the set_max_delay command, and isused to define the minimum delay required in terms of time units for a

Page 137: Advanced ASIC Chip Synthesis - Bhatnagar

116 Chapter 6

particular path. Specifying this command in conjunction with theset_fix_hold command (described in Chapter 9) will instruct DC to adddelays in the block to meet the minimum time unit specified. Thiscommand also has precedence over DC derived timing requirements.

dc_shell-t> set_min_delay 3 –from [all_inputs] –to [all_outputs]

group_path command is used to bundle together timing critical paths ina design, for cost function calculations. Groups enable you to prioritizethe grouped paths over others. Different options exist for this command,which include specification of critical range and weights.

dc_shell-t> group_path –to [list out1 out2] –name grp1

Adding too many groups has significant impact on the compile time.Therefore, use it only as a last resort.

Exercise caution while using this command. One may find that usingthis command increases the delay of the worst violating path, in thedesign. This is due to the fact, that DC prioritizes the grouped pathsover other paths in the design. In order to improve the overall costfunction, DC will try to optimize the grouped path over others and maydegrade the timing of another group’s worst violator.

6.3 Clocking Issues

In any design, the most critical part of synthesis is the clock description.There are always issues concerning the pre and post-layout definitions.

Traditionally in the past, big buffers were placed at the source of the clock todrive the full clock network. Thick clock spines were used in the layout foreven distribution of clock network delays and to minimize clock skews.Although this method sufficed for sub-micron technologies, it is miserablyfailing in the VDSM realms. The interconnect RC’s currently account for amajor part of total delay. This is mainly due to the increase in resistance ofthe shrinking metal widths. It is difficult, if not impossible to model clocksusing the traditional approach.

Page 138: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 117

With the arrival of complex layout tools, capable of synthesizing clock trees,the traditional method has changed dramatically. Since, layout tools have thecell placement information, they are best equipped to synthesize the clocktrees. It is therefore necessary to describe clocks in DC, such that it imitatesthe clock delays and skews of the final layout.

6.3.1 Pre-Layout

For reasons explained above, it is best to estimate the clock tree latency andskew during the pre-layout phase. To do this, use the following commands:

dc_shell-t> create_clock –period 40 –waveform {0 20} CLK

dc_shell-t> set_clock_latency 2.5 CLK

dc_shell-t> set_clock_uncertainty –setup 0.5 –hold 0.25 CLK

dc_shell -t> set_clock_transition 0.1 CLK

dc_shell -t> set_dont_touch_network CLK

dc_shell -t> set_drive 0 CLK

For the above example, a delay of 2.5 ns is specified as the overall latencyfor the clock signal CLK. In addition, the set_clock_uncertainty commandapproximates the clock skew. One can specify different numbers for thesetup and hold time uncertainties by using –setup and –hold options asexemplified above.

Furthermore, specification of clock transition is essential. This restricts themax transition value of the clock signal. The delay through a cell is affectedby the slope of the signal at its input pin and the capacitive loading present atthe output pin. The clock network generally feeds large endpoints. Thismeans, that although the clock latency value is fixed, the input transition timeof the clock signal to the endpoint gates will still be slow. This results in DCcalculating unrealistic delays (for the endpoint gates), even though in reality,the post-routed clock tree ensures fast transition times.

Page 139: Advanced ASIC Chip Synthesis - Bhatnagar

118 Chapter 6

6.3.2 Post-Layout

Defining clocks after layout is relatively easy, since the user does not need toworry about the clock latency and skews. They are determined by the qualityof the routed clock tree.

Some layout tools provide direct interface to DC. This provides a smoothmechanism for taking the routed netlist consisting of clock tree, back to DC.If this information is not present, then the user should extract the clocklatency and the skew information from the layout tool. Using the pre-layoutapproach, this information can be used to define the clock latency and clockskew, as described before. If however, the netlist can be ported to DC, thenthe following commands may be used to define the clocks. For example:

dc_shell> create_clock –period 40 –waveform [list 0 20] CLK

dc_shell> set_propagated_clock CLK

dc_shell> set_clock_uncertainty –setup 0.25 –hold 0.05 CLK

dc_shell> set_dont_touch_network CLK

dc_shell> set_drive 0 CLK

Notice the absence of the set_clock_latency command and the inclusion ofset_propagated_clock command. Since, we now have the clock treeinserted in the netlist, the user should propagate the clock instead of fixing itto a certain value. Similarly, the set_clock_transition command is no longerrequired, since DC will now calculate the input transition value of the clocknetwork, based on the clock tree. In addition, a small clock uncertainty valuemay also be defined. This ensures a robust design that will function takinginto account a wider process variance.

Some companies do not possess their own layout tools, but they rely onoutside vendors to perform the layout. This situation of course varies fromone company to the other. If the vendor provides the user, the post-routed

Page 140: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 119

netlist containing the clock tree, then the above method can be utilized. Insome instances, instead of providing the post-routed netlist, the vendor onlysupplies the SDF file containing point-to-point timing for the entire clocknetwork (and the design). In such a case, the user only needs to define theclock for the original netlist and back-annotate the SDF to the original netlistwithout propagating the clock. The clock skews and delays will bedetermined from the SDF file, when performing static timing analysis.

6.3.3 Generated Clocks

Many complex designs contain internally generated clocks. An example ofthis is the clock divider logic that may be used to generate secondary clock(s)of different frequency, derived from the primary clock source. If the primaryclock has been designated as the clock source, then a limiting factor of DC isthat does not automatically create a clock object for the generated clocks.

Consider the logic illustrated in Figure 6-5. A clock divider circuit in clk_divblock, is used to divide the frequency of the primary clock CLK by half, andthen generate the divided clock that drives Block A. The primary clock is also

Page 141: Advanced ASIC Chip Synthesis - Bhatnagar

120 Chapter 6

used to clock, Block B and is buffered internally (in the clk_div block), beforefeeding Block B.

Assignment of clock object through create_clock command on CLK inputto the top-level is sufficient for the clock feeding block B. This is because theclkB net inherits the clock object (through the buffer) specified at the primarysource. However, clkA is not so fortunate. DC is unable to propagate theclock object throughout the entire net because the specification of clockobject on primary source CLK stops at the register (shown as shaded flop).To avoid this situation, the clock object for clkA should be specified on theoutput port of the clk_div block. The following commands may be used tospecify the clocks for the above example:

dc_shell> create_clock –period 40 –waveform {0 20} CLK

dc_shell> create_clock –period 80 –waveform {0 40} \find(port, “clk_div/clkA”)

Alternatively, one may use the create_generated_clock command todescribe the clock, as follows:

dc_shell-t> create_generated_clock –name clkA \–source CLK \–divide_by 2

6.4 Putting it Together

Example 6.1 provides a brief overview of some of the commands describedin this chapter.

Example 6.1

## Design entry

analyze –format verilog sub1 .vanalyze –format verilog sub2.v

Page 142: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 121

analyze –format verilog top_block.v

elaborate top_block

current_design top_blockuniquifycheck_design

## Setup operating conditions, wire load, clocks, resets

set_wire_load_model large_wlset_wire_load_mode enclosedset_operating_conditions WORST

create_clock –period 40 –waveform [list 0 20] CLKset_clock_latency 2.0 [get_clocks CLK]set_clock_uncertainty –setup 1.0 –hold 0.05 [get_clocks CLK]

set_dont_touch_network [list CLK RESET]

## Input drives

set_driving_cell –cell [get_lib_cell buff3] –pin Z [all_inputs]set_drive 0 [list CLK RST]

## Output loads

set_load 0.5 [all_outputs]

## Set input & output delays

set_input_delay 10.0 –clock CLK [all_inputs]set_input_delay –max 19.0 –clock CLK { IN1 IN2}set_input_delay –min –2.0 –clock CLK IN3

Page 143: Advanced ASIC Chip Synthesis - Bhatnagar

122 Chapter 6

set_output_delay 10.0 –clock CLK [all_outputs]

## Advanced constraints

group_path –from IN4 –to OUT2 –name grp1

set_false_path –from IN5 –to sub1/dat_reg*/*

set_multicycle_path 2 –from sub1/addr_reg/CP \–to sub2/mem_reg/D

## Compile and write the database

compile

current_design top_block

write –hierarchy –output top_block.dbwrite –format verilog –hierarchy –output top_block.sv

## Create reports

report_timing –nworst 50

6.5 Chapter Summary

This chapter described all the basic and advanced commands used in DC,along with numerous tips to enhance the synthesis process. Focus was alsogiven to the real time issues facing the designers as they descend deep intothe VDSM technology.

A separate section was dedicated to issues related to clocks. This sectiondescribed various techniques useful for specifying clocks, both for pre and

Page 144: Advanced ASIC Chip Synthesis - Bhatnagar

CONSTRAINING DESIGNS 123

post-layout. This section also included a topic on specification of generatedclocks that are present in almost all designs. Finally, example DC scriptswere included to guide the users to perform complex and successfulsynthesis.

Page 145: Advanced ASIC Chip Synthesis - Bhatnagar

7

OPTIMIZING DESIGNS

Ideally, a synthesized design that meets all timing requirements and occupiesthe smallest area is considered fully optimized. To achieve this goal, onemust understand the behavior of synthesis process.

This chapter guides the reader to successfully optimize the design to obtainthe best possible results.

7.1 Design Space Exploration

To achieve the smallest area while maximizing the speed of the designrequires a fair amount of experimentation and iterative synthesis. The processof analyzing the design for speed and area to achieve the fastest logic withminimum area is termed – design space exploration.

Various factors influence the optimization process, primarily the codingstyle. While coding, designers generally focus on the functionality of thedesign and may not consider the synthesis guidelines, previously explained inChapter 5 (This is a fact of life, we just have to live with it). At a later stagemodifications to the HDL code are performed to facilitate the synthesis

Page 146: Advanced ASIC Chip Synthesis - Bhatnagar

126 Chapter 7

process. In reality, the HDL is generally fixed and only minor modificationsare done, since major changes may impact other blocks or test benches. Forthis reason, changing the HDL code to help synthesis is less desirable.

For the sake of design space exploration, we can assume that the HDL codeis frozen. It is now the designer’s responsibility to minimize the area andmeet the target timing requirements through synthesis and optimization.

Starting from version 98 of DC (or DC98) the previous compile flowchanged. The timing is prioritized over area. This is shown in Figure 7-1.Another major difference between DC98 and previous versions is that, DC98performs compilation to reduce “total negative slack” instead of “worstnegative slack”. This ability of DC98 produces better timing results but hassome impact on the area. Also, in previous versions area minimization washandled automatically, however, DC98 and later versions requires designersto specify area constraints explicitly. Generally some area cleanup isperformed by default even without specifying the area constraints but betterresults are obtained by including the constraints for area.

Page 147: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 127

Although, the delay is prioritized over area, it is extremely important toprovide DC with realistic constraints. Some designers while performingbottom-up compile, fail to realize this point and over constrain the design.This causes DC to bloat the logic in order to meet the unrealistic timinggoals. This is especially true for DC98 because it works on the reduction oftotal negative slack. This relationship between constraints and area is shownin Figure 7-2, which emphasizes that the area increases considerably withtightening constraints.

Another representation of varying constraint is shown in Figure 7-3. Thisillustrates the relationship between constraints and delay across the design. Itis shown that the actual delay of the logic decreases with tighteningconstraints, while relaxed constraints produce increased delay across thedesign. The horizontal part of the line on the left denotes that the constraintsare so tight that further tightening of the constraints will not result inreduction of delay. Similarly the horizontal part of the line on the rightsignifies fully relaxed constraints, resulting in no further increase in delay.

Page 148: Advanced ASIC Chip Synthesis - Bhatnagar

128 Chapter 7

To further explain this concept, consider the diagram shown in Figure 7-4.For overly constrained design, DC tries to synthesize “vertical logic” to meetthe tight timing constraints. However, if the timing constraints are nonexistent, the synthesized design would result in “horizontal logic”, violatingthe actual timing specifications.

The idea here is to find a common ground, by specifying realistic timingconstraints. It is recommended to over constrain the design by a smallamount (maybe 10 percent tighter than required) to avoid too manysynthesis-layout iterations. This produces minimum area for the design,while still meeting the timing specifications. For this reason, choose thecorrect compile methodology, described in subsequent sections.

Page 149: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 129

7.2 Total Negative Slack

The previous section briefly introduced the phrase “Total Negative Slack” orTNS for short. With the advent of DC98, a lot of importance has been givento this, and designers need to understand this concept to perform successfullogic optimization.

Prior to DC98 version, DC would optimize the logic based on “WorstNegative Slack” or WNS. The WNS is defined as the timing violation (ornegative slack) of a signal traversing from one startpoint to the endpoint for aparticular path. During compile, DC would reduce the WNS one by one, inorder to reduce total violations of the block. For this reason, grouping pathsand specifying the critical range for timing-critical segments was consideredessential.

Page 150: Advanced ASIC Chip Synthesis - Bhatnagar

130 Chapter 7

DC98 not only, prioritizes delay over area but also targets TNS instead ofWNS. To understand the concept of TNS consider the logic diagram shownin Figure 7-5. The WNS in this case is –5ns; from RegA to RegB. The TNS isthe summation of all WNS per endpoint and in this case equals –8ns, i.e.,WNS to RegA plus WNS to RegB.

There are several advantages to using this technique; primarily it producesfewer timing violations as compared to the previous method. Another benefitis, when using bottom-up compile methodology, the critical paths present atthe sub-module may not be seen as critical from the top level. Reducing TNSof the overall design minimizes this effect. By providing smaller number ofviolating paths to the timing driven layout tool, less iterations betweensynthesis and layout can also be achieved.

Although, reduction of TNS over WNS produces less timing violations, itdoes have an impact on the overall area. It is recommended that you set areaconstraints, regardless of the kind of optimization performed. By default,DC98 prioritizes TNS over area. Area optimization occurs only for thosepaths with positive slack. In order to prioritize area over TNS, you may usethe following command:

dc_shell> set_max_area 0 –ignore_tns

Page 151: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 131

7.3 Compilation Strategies

Synopsys recommends the following compilation strategies that dependentirely on how your design is structured and defined. It is up to userdiscretion to choose the most suitable compilation strategy for a design.

a)

b)

c)

d)

Top-down hierarchical compile method.

Time-budget compile method.

Compile-characterize-write-script-recompile (CCWSR) method.

Design Budgeting method.

7.3.1 Top-Down Hierarchical Compile

Prior to the release of DC98, the top-down hierarchical compile method wasgenerally used to synthesize very small designs (less than 10K gates). Usingthis method, the source is compiled by reading the entire design. Based onthe design specifications, the constraints and attributes are applied, only atthe top level. Although, this method provides an easy push-button approachto synthesis, it was extremely memory intensive and viable only for verysmall designs.

The release of DC98 provides Synopsys the capability to synthesize milliongate designs by tackling much larger blocks (>100K) at a time. This indeedmay be a feasible approach for some designs depending on the design style(single clock etc.) and other factors. You may use this technique tosynthesize larger blocks at a time by grouping the sub-blocks together andflattening them to improve timing.

The advantages and disadvantages of this methodology are summarizedbelow:

Page 152: Advanced ASIC Chip Synthesis - Bhatnagar

132 Chapter 7

7.3.1.1 Advantages

a)b)

Only top level constraints are needed.Better results due to optimization across entire design.

7.3.1.2 Disadvantages

a)

b)c)

Long compile times (although, DC98 is much faster than previousreleases).Incremental changes to the sub-blocks require complete re-synthesis.Does not perform well, if design contains multiple clocks or generatedclocks.

7.3.2 Time-Budgeting Compile

The second compilation approach to synthesis is termed as the time-budgeting strategy. This strategy is useful, if the design has been partitionedproperly with timing specifications defined for each block of the design, i.e.,designers have time budgeted the entire design, including the inter-blocktiming requirements.

The designer manually specifies the timing requirements for each block ofthe design, thereby producing multiple synthesis scripts for individual blocks.The synthesis is usually performed bottom-up i.e., starting at the lowest leveland ascending to the topmost level of the design. This method targetsmedium to very large designs and does not require large amounts of memory.

Consider the following design, illustrated in Figure 7-6. The top level moduleincorporates blocks A and B. The specifications for both of these blocks arewell defined and can be directly translated to Synopsys constraints. Fordesigns like these, the time-budgeting compilation strategy is ideal.

Page 153: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 133

This advantages and disadvantages of this methodology are listed below:

7.3.2.1 Advantages

a)b)

c)

d)

Easier to manage the design because of individual scripts.Incremental changes to the sub-blocks do not require complete re-synthesis of the entire design.Does not suffer from design style e.g., multiple and generated clocks areeasily managed.Good quality results in general because of flexibility in targeting andoptimizing individual blocks.

7.3.2.2 Disadvantages

a)b)c)

Tedious to update and maintain multiple scripts.Critical paths seen at the top-level may not be critical at lower level.The design may need to be incrementally compiled in order to fix theDRC’s.

Figure 7-7 illustrates the directory structure and data organization, suited forthis strategy. To automate the synthesis process, a makefile is used (refer to

Page 154: Advanced ASIC Chip Synthesis - Bhatnagar

134 Chapter 7

Appendix). The makefile specifies, the dependencies of each block andemploys the user specified scripts (kept in the script directory) to compile thewhole design, starting from the lowest level and ending at the top-most level.After the synthesis of each block, the results are automatically moved to theirrespective directories. The variables used in the makefile are defined in theusers .cshrc file, for e.g., $SYNDB may be defined as:/home/project/design/syn/db

7.3.3 Compile-Characterize-Write-Script-Recompile

This approach is useful for medium to very large designs that do not havegood inter-block specifications defined. This method is not limited byhardware memory and allows for time budgeting between the blocks.

Page 155: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 135

This approach requires constraints to be applied at the top level of the design,with each sub-block compiled beforehand. The sub-blocks are thencharacterized using the top-level constraints. This in effect propagates therequired timing information from the top-level to the sub-blocks. Performinga write_script on the characterized sub-blocks generates the constraint filefor each sub-block. These constraint files are then used to re-compile eachblock of the design.

Although this approach normally produces good results, it is recommendedthat designers use the Design Budgeting method the usage of which isexplained in the next section.

7.3.3.1 Advantages

a)b)

c)

Less memory intensive.Good quality of results because of optimization between sub-blocks of thedesign.Produces individual scripts, which may be modified by the user.

7.3.3.2 Disadvantages

a)b)

c)

d)

The generated scripts are not easily readable.Synthesis suffers from Ping-Pong effect. In other words it may bedifficult to achieve convergence between blocks.A change at lower level block generally requires complete re-synthesis ofthe entire design.Long runtimes if the block becomes over-constrained.

7.3.4 Design Budgeting

This method is by far the most suitable compile strategy for tackling designsthat do not have good inter-block specifications. This approach automaticallyallocates the top-level design specifications to the lower-level blocks. Designbudgeting is invoked from within DC or PT, although it can also be invokedby typing budget_shell. This shell uses the Tcl interface and cannot be used

Page 156: Advanced ASIC Chip Synthesis - Bhatnagar

136 Chapter 7

with the old non-Tcl dc_shell commands. It is recommended to use PT incase you want to take advantage of the PT’s GUI interface.

This method cannot be used directly from the RTL stage. The design must besynthesized first to a mapped gate level netlist before it can be budgeted.Once the design is synthesized the budgeter is run on the entire design andscripts are generated for the sub-blocks. The number of hierarchical levels tobudget is under full control of the user. In other words, the budgeter willallocate budgets for any number of hierarchical levels that is defined by theuser. The generated scripts with accurate constraints are subsequently used tore-synthesize each block in parallel (to reduce runtime) with accurateconstraints.

This is a very power method of gate-level optimization. However, it is aniterative process. The user has full control over the scripts, thus the amountof optimization. The generated scripts can be further massaged to suitindividual needs. In addition, this method can be used even after the layoutstage in order to produce more accurate constraints. This is accomplished bybudgeting the back–annotated design.

7.3.4.1 Advantages

a)b)

c)d)e)f)

Provides accurate constraints across the entire design, thus better QOR.Does not suffer from the ping–pong effect (as in the Characterize–compile method)Saves runtime by providing the ability to perform parallel compiles.Provides ability to customize the scripts to suit individual needs.Scripts can be used after the elaborate stage (design in GTECH stage)Less memory intensive

7.3.4.2 Disadvantages

a)

b)c)

Cannot budget the RTL itself. Design must be a structured gate–levelnetlist.Cannot budget using both best and worst operating conditions.Iterative process (although, this is not a severe limitation)

Page 157: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 137

An example script illustrating this strategy is shown below. The budgetingcommands have been highlighted in bold.

pt_shell> read_verilog mydesign.svpt_shell> source constraints.scr #Top level constraintspt_shell> allocate_budgets –levels 2 –write_context –format dctcl

In the above example, the allocate_budgets command invokes the DesignBudgeter that allocates the budgets to each sub–block of the designdescending down to 2 levels of hierarchy. The –write_context optioninstructs the budgeter to generate scripts. The –format option specifies theformat of generated scripts. The allowed values are ptsh (Primetime Tc1format), dctcl (DC Tc1 format: dc_Shell_t) and dcsh (DC format:dc_shell). The ptsh format is the default.

Several other commands are also available for this method. Users are advisedto refer to the Design Budgeting User Guide for full details.

7.4 Resolving Multiple Instances

Before proceeding for optimization, one needs to resolve multiple instancesof the sub-blocks of your design. This is a required step, since DC does notpermit compilation until the multiple instances present in the design areresolved.

To better explain the concept of multiple instantiations of a module, considerthe architecture of a design shown in Figure 7-8. Lets presume that you havechosen the time-budgeting compilation strategy and have synthesizedmoduleA separately. You are now compiling moduleB that instantiatesmoduleA twice as U1 and U2. The compilation will be aborted by DC withan error message stating that moduleA is instantiated 2 times in moduleB.There are two recommended methods of resolving this. You may eitherassign a dont_touch attribute to moduleA before reading moduleB, oruniquify moduleB. uniquify is a dc_shell command that in effect createsunique definitions of multiple instances. In this case, it will generatemoduleA_U1 and moduleA_U2 (in Verilog), corresponding to instance U1and U2 respectively as illustrated in Figure 7-9.

Page 158: Advanced ASIC Chip Synthesis - Bhatnagar

138 Chapter 7

It is recommended to always uniquify the design, regardless of thecompilation methodology chosen. The reason for this suggestion becomesevident while planning to perform clock tree synthesis during layout. Thiswill be explained in detail in Chapter 9.

Page 159: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 139

7.5 Optimization Techniques

This section describes various optimization techniques used to fine tune yourdesign. Before we start on this subject, it is important to know that DC usescost functions to optimize the design. This topic is thoroughly covered in theDC Reference manual, therefore will not be dealt with here. I would insteadlike to concentrate more on the practical optimization techniques instead ofthe mathematical algorithms used for calculating cost functions. Let’s justsay that DC calculates the cost functions based on the design constraints andDRCs to optimize the design.

7.5.1 Compiling the Design

Compilation of the design or modules is performed by the compilecommand. This command performs the actual mapping of the HDL code togates from the specified target library. DC provides a range of options forthis command, to fully control the mapping optimization of the design.

The command syntax along with most commonly used options is describedbelow.

compile –map_effort <low | medium | high>–incremental_mapping–in_place–no_design_rule | –only_design_rule–scan

By default, compile uses –map_effort medium, which usually producesideal results for most of the designs. It also uses the default settings forstructuring and flattening attributes, described in the next section. The–map_effort high, should only be used, if the target objectives are notachieved through the default compile. This option enables DC to maximizeits effort around the critical path by restructuring and re-mapping of logic, inorder to meet the specified constraints. Beware, this usually produces longcompile times.

Page 160: Advanced ASIC Chip Synthesis - Bhatnagar

140 Chapter 7

The –incremental_mapping option is used, only after the initial compile(i.e., the design has been mapped to gates of the technology library), as itperforms only at the gate level. This is a very useful and commonly usedoption. It is generally used to improve the timing of the logic and to fixDRCs. During incremental compile, DC performs various mappingoptimizations in order to improve timing. Although Synopsys states that theresultant design will not worsen and may only improve in terms of designconstraints; on rare occasions, using the above option may actually degradethe timing objectives. Users are therefore advised to experiment and use theirown judgement. Nevertheless, the usefulness of this command is apparentwhile fixing DRCs at the top level of the design. To perform this, you mayuse –only_design_rule option while compiling incrementally. This preventsDC from performing mapping optimizations and concentrate only on fixingthe DRCs.

The –no_design_rule option is not used frequently and as the namesuggests, it instructs DC to refrain from fixing DRCs. You may use thisoption for initial passes of compile, when you don’t want to waste timefixing DRC violations. At a later stage, generate the constraint report andthen re-compile incrementally to fix DRCs. This is obviously a painfulapproach and users are advised to make their own judgement.

To achieve post layout timing convergence, it is sometimes necessary toresize the logic to fix timing violations. The –in_place option provides thecapability of resizing the gates. Various switches available to designers tocontrol the buffering of the logic govern this option. The usage of this optionis described in detail in Chapter 9.

The –scan option uses the test ready compile feature of DC. This optioninstructs DC to map the design directly to the scan-flops – as opposed tosynthesizing to normal flops before replacing them with their scanequivalents, in order to form the scan-chain. An advantage of using thisfeature is that, since the scan-flops normally have different timing associatedwith them as compared to their non scan equivalent flops (or normal flops),using this techniques makes DC take the scan-flop timing into account whilesynthesizing. This produces optimized scan inserted logic with correcttiming.

Page 161: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 141

7.5.2 Flattening and Structuring

Before we begin this discussion, it must be noted that the term “flattening”used here does not imply “removing the hierarchy”. Flattening is a commonacademic term for reducing logic to a 2-level AND/OR representation. DCuses this approach to remove all intermediate variables and parenthesis(using boolean distributive laws) in order to optimize the design. This optionis set to “false” by default.

The optimization of the design is performed in two phases as shown inFigure 7-10. The logic optimization is performed initially by structuring andflattening the design. The resulting structure is then mapped to gates, usingmapping optimization techniques. The default settings for flatten andstructure attributes are:

Page 162: Advanced ASIC Chip Synthesis - Bhatnagar

142 Chapter 7

As shown in the above table, flattening (set_flatten true) the design andBoolean optimization (set_structure –boolean true) is only performedwhen enabled.

7.5.2.1 Flattening

Flattening is useful for unstructured designs for e.g., random logic or controllogic, since it removes intermediate variables and uses boolean distributivelaws to remove all parenthesis. It is not suited for designs consisting ofstructured logic e.g., a carry-look-ahead adder or a multiplier.

Flattening results in a two-level, sum-of-products form, resulting in a verticallogic, i.e., few logic levels between the input and output. This generallyresults in achievement of faster logic, since the logic levels between theinputs and outputs are minimized. Depending upon the form of designflattened and the type of effort used, the flattened design can then bestructured before the final technology mapping optimization. This is arecommended approach and should be performed to reduce the area, becauseflattening the design may cause a significant impact on the area of the design.A point to remember, if you flatten the design using –effort high option, thenDC may not be able to structure the design, therefore use this attributejudiciously.

In general, compile the design using default settings, since most of the timethey perform adequately. Designs failing timing objectives may be flattened,with structuring performed as a second phase (on by default). If the design isstill failing timing goals, then turn off structuring and flatten only. You mayalso experiment by inverting the phase assignment that sometimes producesremarkable results. This is done by setting the –phase option of theset_flatten command to “true”. This enables DC to compare the logicproduced by inverting the equation versus the non-inverted form of theequation.

Page 163: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 143

For a hierarchical design, flatten attribute is set only on the current_design.All sub-blocks do not inherit this attribute. If you want to flatten the sub-blocks, then you have to explicitly specify using the –design option. Thesyntax for the flatten attribute along with most commonly used options is:

set_flatten <true | false>–design <list of designs>–effort <low | medium | high>–phase <true | false>

7.5.2.2 Structuring

Structuring is used for designs containing regular structured logic, for e.g., acarry-look-ahead adder. It is enabled by default for timing only. Whenstructuring, DC adds intermediate variables that can be factored out. Thisenables sharing of logic that in turn results in reduction of area. For example:

Before StructuringP = a x + a y+ cQ = x + y + z

After StructuringP = aI + cQ = I + zI = x + y

It is important to note that, structuring produces shared logic that has animpact on the total delay of the logic. With the absence of specified timingconstraints (or structuring is turned off with respect to timing), the logicproduced will generally result in large delays across the block boundaries.Therefore, it is recommended that realistic constraints be specified, inaddition to using the default settings.

Structuring comes in two flavors: timing (default) and boolean optimization.The latter is a useful method of reducing area, but has a greater impact ontiming. Good candidates for boolean type of optimization are non criticaltiming circuitry e.g., random logic structures and finite state machines. Asthe name suggests, this algorithm uses boolean logic optimization to reducearea. Prior to version v1997.01, DC used a different algorithm to performboolean optimization. Synopsys have since introduced another algorithm that

Page 164: Advanced ASIC Chip Synthesis - Bhatnagar

144 Chapter 7

is more efficient and requires less run time. This algorithm is based onautomatic test pattern generation (ATPG) techniques to manipulate logicnetworks. To enable this algorithm, you have to set the following variable to“true”:

compile_new_boolean_structure = true

As with flattening, the set_structure command applies only to thecurrent_design. The syntax of this command along with most commonlyused options is:

set_structure <true | false>–design <list of designs>–boolean <true | false>–timing <true | false>

In general, the design compiled with default settings produce satisfactoryresults. However, if your design is non-timing critical and you want tominimize for area only, then set the area constraints (set_max_area 0) andperform boolean optimization. For all other cases, structure with respect totiming only.

7.5.3 Removing Hierarchy

By default, DC maintains the original hierarchy of the design. The hierarchyis in-effect a logical boundary, which prevents DC from optimizing acrossthis boundary. Many designers create unnecessary hierarchy for unknownreasons. This not only makes the synthesis process more cumbersome butalso results in an increased number of synthesis scripts. As mentioned before,DC optimizes within logical boundaries. Having needless hierarchy in thedesign limits DC to optimize within that boundary without optimizing acrossthe hierarchy.

Consider the logic shown in Figure 7-11 (a). The top level (Block T)incorporates two blocks, A and B. The logic present at the output of block Aand at the input of block B are separated by the block boundaries. Twoseparate optimizations of block A and B may not result in optimal solution.

Page 165: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 145

By combining block A and B (i.e., removing the boundaries) as shown inFigure 7-1l(b), the two logic bubbles may be optimized as one, resulting in amore optimal solution. Designers (not Synopsys) refer to this process as“flattening” the design.

To perform this, you may use the following command:

dc_shell> current_design BlockT

dc_shell> ungroup–flatten –all

7.5.4 Optimizing Clock Networks

Optimizing clock networks is one of the hardest operations to perform. Thisis due to the fact that as we descend towards VDSM technologies, the

Page 166: Advanced ASIC Chip Synthesis - Bhatnagar

146 Chapter 7

resistance of the metal increases dramatically causing enormous delays fromthe input of the clock pin to the registers. Also, low power design techniquesrequire gating the clock to minimize switching of the transistors when thedata is not needed to be clocked. This technique uses a gate (e.g., an ANDgate), with inputs for clock and enable (used to enable or disable the clocksource).

Previous methodologies included placement of a big buffer at the top level ofthe chip, near the clock source capable of driving all the registers in thedesign. Thick trunks and spines (e.g., fishbone structure) were used to fan theentire chip in order to reduce clock skew and minimize RC delays. Althoughthis approach worked satisfactorily for technologies 0.5um and above, it isdefinitely not suited for VDSM technologies (0.35um and less). The aboveapproach also meant increased number of synthesis-layout iterations.

With the advent of complex layout tools, it is now possible to synthesize theclock tree within the layout tool itself. The clock tree approach works bestfor VDSM technologies and although power consumption is a concern, theclock latency and skew are both minimal compared to the big bufferapproach. The clock tree synthesis (CTS) is performed during layout after theplacement of the cells and before routing. This enables the layout tool toknow the exact placement location of the registers in the floorplan. It is theneasy for the layout tool to place buffers optimally, so as to minimize clockskews. Since optimizing clocks are the major cause in increased synthesis-layout iterations, performing CTS during layout reduces this cycle.

We still have to optimize the clocks during synthesis before taking it tolayout. We cannot assume that the layout tool will give us the magic clocktree that will solve all of our problems. Remember the more optimized yourinitial netlist, the better results you will get from the layout tool.

So how do we optimize clock networks during synthesis? By setting aset_dont_touch_network to the clock pin, you are assured that DC will notbuffer up the network in order to fix DRCs. This approach works fine formost designs that do not contain clock-gating logic. But what if the clocksare gated? If you set the set_dont_touch_network on the clock that is gatedthen DC will not even size up the gate (let’s assume a 2-input AND gate).This is because, the set_dont_touch_network propagates through all the

Page 167: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 147

combinational logic (AND gate, in this case), until it hits an endpoint (inputclock pin of the register, in this case). This causes the combinational logic toinherit the dont_touch attribute also, which results in un-optimized gatinglogic that may violate DRCs, hence overall timing.

For instance, suppose the clock output from the AND gate is fanning out to alarge number of registers and DC inferred a minimum drive strength for theAND gate. This will cause slow input transition times being fed to theregisters resulting in horrendous delays for the clock net. To avoid this, youmay remove the set_dont_touch_network attribute and performincremental compilation. This will size up the AND gate and also insertadditional buffers from the output of the AND gate, to the endpoints.Although, this approach seems ideal, it does suffer from some shortcomings.Firstly, it takes a long time for incremental compile to complete, and on rareoccasions may produce sub-optimal results. Secondly, a lot of foresight isneeded, for e.g., you need to apply set_dont_touch_network attribute on allother nets (resets and scan related signals that may not require clock tree).

A second approach is to find all high fanout nets in your design using thereport_net command and buffer it from point to point using thebalance_buffer command. (Refer to the DC reference manual for actualsyntax for this command). Since, the balance_buffer command does nottake clock skew into account, it should not be used as an alternative to clocktree synthesis.

Another technique is to perform in-place-optimization (IPO), using compile–in_place, with compile_ok_to_buffer_during_inplace_opt switch, set to“false”. This prevents DC from inserting additional buffers and will only sizeup the AND gate.

It must be noted that the above mentioned techniques are totally designdependant. Various methods have been provided that may be used for clocknetwork optimization. Sometimes, you may find that you have to perform allthe above methods to get optimal results and other times a single approachworks perfectly.

Regardless of which method you use, you should also consider what youwant to do during layout. For designs without gated clocks, it is preferable

Page 168: Advanced ASIC Chip Synthesis - Bhatnagar

148 Chapter 7

that CTS be performed at the layout level. For other design, with gatedclocks, you have to analyze the clock in the design (pre and post-synthesis)carefully and take appropriate action. This may also include inserting theclock tree (during layout) after the AND gate for each bank of registers. Mostlayout tool vendors have realized this problem and offer various techniquesto perform clock tree synthesis for gated clocks.

7.5.5 Optimizing for Area

By default, DC tries to optimize the design for timing. Designs that are non-timing critical but area intensive should be optimized for area. This can bedone by initially compiling the design, with specification of arearequirements, but no timing constraints. In other words, the design issynthesized with area requirements only. No timing constraints are used.

In addition, one may choose to eliminate the high-drive strength gates byassigning the dont_use attribute on them. The reason for eliminating high-drive strength gates is that they are normally used to speed up the logic inorder to meet timing, however, they are larger in size. By eliminating theirusage, considerable reduction in area may be achieved.

Once the design is mapped to gates, the timing and area constraints shouldagain be specified (normal synthesis) and the design re-compiledincrementally. The incremental compile ensures that DC maintains theprevious structure and does not bloat the logic unnecessarily.

7.6 Chapter Summary

Optimizing design, is the most time consuming and difficult task, since itdepends enormously on various factors e.g., HDL coding styles, type oflogic, constraints etc. This chapter described advanced optimizationtechniques and how they affect the synthesis process.

A detailed description of the impact on timing and area by varying designconstraints is discussed. To reiterate, the best results are achieved by

Page 169: Advanced ASIC Chip Synthesis - Bhatnagar

OPTIMIZING DESIGNS 149

providing DC with realistic constraints. Over constraining a design results inlarge area and sub-optimal results.

With the introduction of DC98, the optimization flow has changed with moreemphasis given on the timing, rather than area. Although by default areacleanup is always performed at the end of compilation, regardless, it isrecommended that area constraints be specified. The timing optimization isperformed by DC98 by minimizing the TNS. Prior to DC98, timingoptimization was performed by reducing WNS per endpoint. The TNSoptimization provides far superior results, although it does have an impact onthe overall area of the design.

Various compile strategies are illustrated in this chapter, along with examplesto automate this process. To successfully optimize a design, you may choosea single methodology or mix these strategies to get the desired result. All thestrategies have their own advantages and disadvantages, which have alsobeen illustrated. Choose the one, which best suits your design.

A separate section was devoted to uniquifying the design. Although, this stepmay not be needed, as argued by some designers, it is recommended toalways uniquify the design because of reasons outlined in Chapter 9.

Finally, other optimization steps are discussed that emphasize on “how toproduce optimal synthesized netlists”. Various techniques, including clocknetwork optimization and optimizing designs for area, were described alongwith recommended approaches.

Page 170: Advanced ASIC Chip Synthesis - Bhatnagar

8

DESIGN FOR TEST

The Design-for-Test or DFT techniques are increasingly gaining momentumamong ASIC designers. These techniques provide measures tocomprehensively test the manufactured device for quality and coverage.

Traditionally, testability was considered as an after thought, withimplementation done only at the end of the design cycle. This approachusually provided minimal coverage and often led to unforeseen problems thatresulted in increased cycle time. Merging testability features early in thedesign cycle was the final solution, creating the name Design-for-Test.

8.1 Types of DFT

Various vendors, including Synopsys provide solutions for incorporatingtestability in the design. Synopsys adds the DFT capabilities to DC throughits DFT Compiler (DFTC) that is incorporated within the DC suite of tools.The main DFT techniques that are currently in use today are:

Page 171: Advanced ASIC Chip Synthesis - Bhatnagar

152 Chapter 8

a)b)c)d)

Scan insertionMemory BIST insertionLogic BIST insertionBoundary-Scan insertion

Of all four, scan and logic BIST insertion is the most complex andchallenging technique, since it involves various design issues that need to beresolved, in order to get full coverage of the design.

8.1.1 Memory and Logic BIST

Unfortunately, Synopsys does not provide any solution for automaticmemory or logic BIST (Built-In-Self-Test) generation. Due to this reasonthese two techniques are not covered in this section. However, there arevendors that do provide a complete solution, therefore a brief overviewdescribing the main function of the memory and logic BIST is included,providing designers an insight into these useful techniques.

The Memory BIST is comprised of controller logic that uses variousalgorithms to generate input patterns that are used to exercise the memoryelements of a design (say a RAM). The BIST logic is automaticallygenerated, based upon the size and configuration of the memory element. Itis generally in the form of synthesizable Verilog or VHDL, which is insertedin the RTL source with hookups, leading to the memory elements. Upontriggering, the BIST logic generates input patterns that are based upon pre-defined algorithm, to fully examine the memory elements. The output resultis fed back to the BIST logic, where a comparator is used to compare whatwent in, against what was read out. The output of the comparator generates apass/fail signal that signifies the authenticity of the memory elements.

Similar to memory BIST, logic BIST uses the same approach but targets thelogic part of the design. The logic BIST uses a random pattern generator toexercise the scan chains in the design. The output is a compressed signaturethat is compared against simulated signature. If the signature of the deviceunder test (DUT) matches the simulated signature, the device passesotherwise it fails. The main advantage of using logic BIST is that iteliminates the need for test engineers to generate huge scan vectors as inputs

Page 172: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 153

to the DUT. This saves tremendous amount of test time. The disadvantage isthat additional logic (thus area) is incorporated within the design, just fortesting purposes.

8.1.2 Boundary Scan DFT

JTAG or boundary scan is primarily used for testing the board connections,without unplugging the chip from the board. The JTAG controller andsurrounding logic also may be generated directly by DC. Boundary-scaninsertion is trivial, since the whole process is rather simple and mostlyautomatic. It is therefore the intent of this chapter to concentrate solely on thescan insertion techniques and issues. Readers are advised to refer to theDesign Compiler Reference Manual for boundary scan insertion techniques.

8.2 Scan Insertion

Scan is one of the most widespread DFT techniques used by design engineersto test the chip for defects such as stuck-at faults. It is possible to attain avery high percent fault coverage (usually above 95%) for most of thedesigns.

The scan insertion technique involves replacing all the flip-flops in thedesign, with special flops that contain built-in logic, solely for testability.The most prevalently used architecture is the multiplexed flip-flop. This typeof architecture incorporates a 2-input mux at the input of the D-type flip-flop.The select line of the mux determines the mode of the device, i.e., it enablesthe mux to be either in the normal operational mode (functional mode withnormal data going in) or in the test mode (with scanned data going in). Thesescan-flops are linked together (using the scan-data input of the mux) to forma scan-chain, that functions like a serial shift register. During scan mode, acombination of patterns are applied to the primary input, and shifted outthrough the scan-chain. If done correctly, this technique provides a very highcoverage for all the combinational and sequential logic within a chip.

Other architectures available along with multiplexed type flip-flops are thelssd structure, clocked scan structure etc. As mentioned above, the mostcommonly used architecture is the multiplexed flip-flop. For this reason,

Page 173: Advanced ASIC Chip Synthesis - Bhatnagar

154 Chapter 8

throughout this section the focus remains on the multiplexed flip-flop typearchitecture, for DFT scan insertion.

Scan can also be used to test the DUT for any possible timing violations. Inorder to understand this, we need to dig deeper into the operation of scantechnique. Basically, scan uses two cycles: Capture and Shift. Scan data isinjected from the primary inputs into the device where it is captured by theflops (going through logic) and is then shifted out to primary outputs where itis compared against the expected results. The signal that selects between thecapture and shift cycle is usually called the scan_enable signal. In additionanother signal is also used which is usually called the scan_mode signal.Scan mode signal is used to put the DUT in test conditions. In general,designs are modified such that under test conditions the device behavesdifferent as opposed to the normal functional behavior. This modification isdesired in order to achieve greater control and/or observability. Sometimes itis done simply to comply with the strict DFT rules.

The following list summarizes the basic scan operation:1.2.3.4.

Load/Unload scan chain (shift cycle)Force primary inputs (except clocks)Measure primary outputsPulse clock to capture functional data (capture cycle)

In order to understand how scan can be used to test the device for timing, abasic understanding of shift and capture cycle is necessary.

8.2.1 Shift and Capture Cycles

During the shift cycle the data traverses the entire design through a big daisychain of registers. These registers are chained together to behave like a shiftregister (hence the name: shift cycle). The main difference between the shiftand capture cycles is that the capture cycle utilizes the functional “D” inputof the flop, whereas the shift cycle uses the “SD” input of the flop (Figure8-1). Toggling the scan_enable input performs the selection of which inputto use. Thus, data that is captured by the “capture” cycle is simply shifted outfor comparison by the tester. In other words, a single clock cycle is needed to

Page 174: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 155

perform the capture operation, while several clock cycles (depending on thelength of the scan chain) are needed to perform the shift operation.

Figure 8-1 illustrate the behavior of the shift and capture cycles. Data isinjected into the device through primary inputs and is shifted out of thedevice through the “SD” input port of the flops. Assume scan_en port isactive high for shift operation. Once the chain has been flushed out andcompared, the scan_en signal is toggled (driven low). Now a single clockpulse is applied to capture the data into the flops through the “D” inputs,before the scan_en is toggled again (driven high) and the data shifted outfor comparison.

One interesting thing to note here is that during the capture cycle, datatraverses the functional path. In other words, it goes through the logic just asit would if the device was operating under normal conditions. Thus if theclock pulse of same frequency as the functional clock is used to perform thecapture cycle, all timing relationships present between flop-to-flop can alsobe checked during the scan testing. This basically means that functionaltesting is not needed if the scan coverage is high and the frequency of scanclock is same as that of the functional clock.

The above case is valid for most of the design structures. However in realdesigns there are generally cases where the functional path may be differentthan the scan path. One such scenario is depicted in Figure 8-2.

Page 175: Advanced ASIC Chip Synthesis - Bhatnagar

156 Chapter 8

In Figure 8-2, during the functional mode of operation, an inverted clock isbeing fed to the second register. However, to make scan chains balanced orto make scan insertion simple, a favorite method used by most designers is tointroduce a test signal called scan_mode. During test, the scan_modeselects the non-inverted clock path, whereas in functional mode thescan_mode is set such that the clock is inverted. In this case, the capturecycle differs from the functional cycle. To test the timing for the pathoriginating from the “Q” output of the first flop and ending at the “D” inputof the second flop, scan capture cycle cannot be used. Designers will have tomanually write test-benches to test this particular path for timing.

Page 176: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 157

8.2.2 RTL Checking

This is a new feature recently introduced by Synopsys. It is part of the DFTCand is used to check the RTL for any possible DFT rule violations. This isone of the most powerful and useful features that enables designers to checkthe RTL design for DFT rules early on in the design cycle. It is invokedthrough a command called rtldrc. The following script summarizes the use ofthis feature. Commands related to RTL checking are highlighted in bold.

dc_shell-1> analyze -f verilog mydesign.vdc_shell-1> elaborate mydesigndc_shell-1> set_scan_configuration –style multiplexed_flip_flopdc_shell-1> set hdlin_enable_rtldrc_info truedc_shell-1> create_test_clock-period 100-waveform {45 55} clkdc_shell-1> set_test_hold 1 scan_modedc_shell-1> set_signal_type test_asynch_inverted resetdc_shell-1> rtldrc > reportfile

Setting the variable hdlin_enable_rtldrc_info to "true" informs DC togenerate a report that points to the actual line number of the source RTL(possible cause of the DRC violation). Without this variable, the report doesnot contain any line numbers. The default is "false".

Page 177: Advanced ASIC Chip Synthesis - Bhatnagar

158 Chapter 8

It is important to note that the above script may be included as part of thefinal script used for synthesis or it may be run stand-alone by the designers tocheck the validity of the design for DRC rules.

8.2.3 Making Design Scannable

Synopsys provides designers the capability to perform scan insertionautomatically, through its test-ready (or one-pass) compile feature. Thistechnique allows designers to synthesize the design, and map the logicdirectly to the scan-flops, thus alleviating potential need for post-insertionadjustments.

Companies that do not use Synopsys tools for scan insertion, instead rely onother means to perform the same task. For such a case, replacing the normalflops in the synthesized netlist, with their scan equivalent flops, beforelinking the scan chains together, performs the scan insertion. It is stronglyrecommended that the static timing analysis should be performed again onthe scan-inserted netlist, since some difference may exist between thecharacterized timing of the scan flops and their equivalent non-scan (normal)flops. This difference if not corrected may adversely affect the total slack ofthe design. To avoid this problem, library developers usually specify thescan-flops timing, to the normal-flops.

To enable the Synopsys test-ready compile feature, the scan style should bechosen prior to compilation. On a particular design, theset_scan_configuration command is used to instruct DC on how toimplement scan. There are various options available for this command thatmay be used to control the scan implementation. Among others, these includeoptions for clock mixing, number of scan chains and the scan style. Onlysome of the most commonly used options with arbitrary arguments are listedbelow for the sake of explanation. Users are advised to refer to DesignCompiler Reference Manual for syntax, and available range of options.

dc_shell-t> set_scan_configuration –style multiplexed_flip_flop \–methodology full_scan \–clock_mixing no_mix \–chain_count 2

dc_shell-t> create_test_clock-period 100–waveform {45 55} clk

Page 178: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 159

dc_shell–t> set_test_hold 1 scan_modedc_shell–t>set_scan_signal test_scan_enable \

–port scan_en–hookup pad/scan_en_pad/Zdc_shell–t> set_scan_signal test_scan_in-port [list PI1 PI2]dc_shell–t>set_scan_signal test_scan_out-port [list PO1 PO2]dc_shell–t> compile-scandc_shell–t> preview_scan

The create_test_clock is used to specify the test clock that is used duringscan operation. In the above example the test clock is called “clk” with aperiod of 100ns, rising at 45ns and falling at 55ns.

The set_test_hold command is used to specify a constant value to the portduring test mode. In the above case the scan_mode port is held logic highduring scan.

The set_scan_signal command identifies the scan in/out ports of the scanchain along with the scan enable signal. Here the command specifies a portcalled scan_en to be used as the scan enable port and instructs DC to hook upthe SE port of all flops in the design to the Z output of the pad calledscan_en_pad.

The compile –scan command compiles the design directly to scan-flopswithout linking them in a scan-chain, i.e., the scan insertion is not performed.The design is mapped to the scan-flops directly, instead of the normal flops.The design at this point is functionally correct, but un-scannable.

The preview_scan command is used to preview the scan architecture,chosen by the set_scan_configuration command.

It is highly recommended that the check_test command be used aftercompilation, to check the design for testability related rule violations. DCflags any violations by issuing warnings/errors. Failure to fix these violationsinvariably results in reduced test coverage. The violations may occur due tovarious DFT related issues, encountered during scan insertion. Some of theseissues and their solutions are discussed in the next section.

dc_shell–t> check_test

Page 179: Advanced ASIC Chip Synthesis - Bhatnagar

160 Chapter 8

It is the designer’s responsibility to correct the violations. This is mainlyachieved by adding extra logic around the “problem” area to provide controlto the test logic. In order to fix these problems, modifying the source RTLinstead of the netlist is the recommended approach. This approach allows thesource RTL to remain as the “golden” database, that may be used forreference at some later stage. On the other hand, if the netlist is modified,then the changes may be forgotten, thus lost, after the design is taped-out.

Although, the scan has not yet been inserted, an additional step at this pointis to get an estimate of the fault coverage of the design by generating thestatistical ATPG test patterns. This step helps quantify the quality of thedesign, at an earlier stage. If the coverage number is low, then the onlyoption is to identify and fix the areas that need further improvement.However, if the fault coverage number is high then this is an indication toproceed ahead.

It must be noted that the fault coverage numbers should be considered as bestcase only, due to the fact that the design may be part of a larger hierarchy,i.e., it may be a sub-block. At the sub-block level, the input portcontrollability and output port observability may be different when this sub-block is embedded in the full design (top-level). For a full design, this maycause a lower fault coverage number, than expected. The followingcommand is used to generate the statistical test patterns:

dc_shell–t> create_test_patterns –sample < n >

Once the problem areas have been identified and fixed in the RTL, the designis ready for scan insertion. Using the following command performs the scaninsertion:

dc_shell–t> insert_scan

The insert_scan command does more than just link the scan-flops togetherto form a scan-chain. It also disables the tri-states, builds and orders the scan-chains, and optimizes them to remove any DRCs. This command may insertadditional test logic, in order to get better control over certain parts of thedesign.

Page 180: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 161

After scan insertion the design should once again be checked for any ruleviolations through the check_test command. The report_test commandmay also be utilized to generate all test-related information about the design.Various options exist to control the output of the report. More details can befound in the Design Compiler Reference Manual.

8.2.4 Existing Scan

Designs with existing scans chains need to be treated differently. Such a casemay exist when you are importing a design that has been scan inserted by aforeign tool other than Synopsys DFTC. In this case, the “db” file does notexist. The input to DFTC is a scan inserted structured netlist. Thus all scanattributes that were part of the “db” file are also absent. In other words,DFTC does not know anything about the scan ports, resets etc.

The scan attributes can be re-applied to the structured netlist in order toperform further processing (such as scan chain ordering through PhyC). Thiscan be accomplished by using the following script. The items of interest thatdifferentiates this from the original one–pass synthesis approach have beenhighlighted in bold.

dc_shell–t> set_scan_configuration –style multiplexed_flip_flop \–methodology full_scan \–existing_scan true

dc_shell–t> create_test_clock-period 100-waveform {45 55} elkdc_shell–t> set_test_hold 1 scan_modedc_shell–t> set_signal_type test_scan_enable scan_endc_shell–t> set_signal_type test_mode scan_mode

# For active low reset, use test_asynch_invert. Active high use test_asynchdc_shell–t> set_signal_type test_asynch reset

dc_shell–t> set_signal_type test_scan_in [list PI1 PI2]dc_shell–t> set_signal_type test_scan_out [list PO1 PO2]

Page 181: Advanced ASIC Chip Synthesis - Bhatnagar

162 Chapter 8

8.2.5 Scan Chain Ordering

The advantages of scan chain ordering are enormous. However, usually withevery good thing there is also something bad associated with it.

The following are some of the benefits of scan chain reordering:1.2.3.4.

5.6.7.

Reduces congestion, thus improves timingLess overall area (net length is dramatically reduced)Improves setup time of functional paths due to decreased flop loadingReduces negative hold-times (mainly a simulation vs. static timinganalysis issue)Improves timing due to less overall capacitanceImproves power consumption by driving less net capacitanceBetter clock tree (lower latency and fewer buffers), thus improvingtiming along with low power consumption.

The disadvantages are:1.2.

Increases the chance of hold-time violations in scan-pathAdditional runtime in the design cycle.

Scan chain ordering is performed using DFTC, Physical Compiler (PhyC) oryour own layout tool. Chapter 10 describes in detail the PhyC approach ofordering the scan chains based on physical proximity of the scan cells. TheDFTC approach is very similar to PhyC. Instead of using physopt, thefollowing option may be used for the insert_scan command to stitch thescan chains more intelligently based on the physical placement location ofscan cells.

dc_shell–t> insert_scan –physical

The above method assumes that the physical information has been back–annotated (in PDEF format) to the design before insert_scan is run. Thecommand to back-annotate the physical information in PDEF format is asfollows:

dc_shell–t> read_pdef <PDEF file>

Page 182: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 163

It must be remembered that scan chain ordering increases the chance ofhold–time violations. This is due to the fact that the flops are near each other,thus creating a very short route for the shift–cycle scan path. In other words,the Q to SD (Figure 8-1) wire length is very short. Thus, scan data may arrivefaster than the clock causing hold–time violation.

Interestingly, DFTC orders the scan chain in an elegant fashion. It analyzesthe logic fed by the source register and tries to find a cell that can be used insuch a way so as not to alter the functionality of the shift cycle. It taps theoutput of this cell to connect to the scan–in port of the destination register.

Consider the diagram shown in Figure 8-3. Here the Q output of the sourceregister is feeding a buffer before it encounters the rest of the logic. In thiscase, insert_scan links the output of the buffer to the SD input of thedestination flop. By doing this, there is an additional delay of the buffer inthe scan path. This delay minimizes the chance of any hold–time violations atthe destination flop.

It must be noted that it does not have to be a buffer (as shown in Figure 8-3).In reality, it can be any cell as long as the functionality of the shift cycle ismaintained. For example insert_scan can tap the QN output of the sourcecell and subsequently use an inverter’s output to link to SD input of thedestination flop. In other words, QN of source flop connects to the inverter’sinput pin with the output of the inverter connected to SD input of thedestination flop.

Page 183: Advanced ASIC Chip Synthesis - Bhatnagar

164 Chapter 8

As mentioned before, scan-chain ordering increases the chance of hold-timeviolations. By utilizing this method, the likelihood of hold–time violations isminimized. This behavior of insert_scan is governed by the followingvariable:

dc_shell-t> set test_disable_find_best_scan_out false

The default is “false” which means that insert_scan will analyze the logicand find the best way possible to stitch the scan chains. If the argument ischanged to true, insert_scan will tap the output Q and link it to pin SD ofthe destination flop. It will not try to analyze the logic fed by Q.

8.2.6 Test Pattern Generation

Upon completion of scan insertion in the design, the test patterns may begenerated for the entire design using TetraMAX. This is an independentATPG tool that is used solely for creating test patterns and provides aseamless interface to DC. It also provides enhanced GUI interface foranalysis and debugging.

Page 184: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 165

The ATPG discussion and the TetraMAX usage are beyond the scope of thisbook. Readers are advised to consult the TetraMAX ATPG User Guide forfurther details.

During dynamic simulation, the test patterns are used as input stimuli to thedesign to exercise all the scan paths. This step should be performed at the fullchip level and preferably after layout.

8.2.7 Putting it Together

Scan insertion is a complicated topic and usually there are many methods ofmaking designs scannable. In order to alleviate the confusion, this sectionprovides an example script that consolidates all the information providedabove.

Script for one–pass scan synthesis, insertion & order

dc_shell-t> analyze -f verilog mydesign.vdc_shell-t> elaborate mydesigndc_shell-t> set_scan_configuration –style multiplexed_flip_flop

–methodology full_scan \–clock_mixing no_mix \–chain_count 2

dc_shell-t> set hdlin_enable_rtldrc_info truedc_shell-t> create_test_clock–period 100-waveform {45 55} clkdc_shell-t> set_test_hold 1 scan_modedc_shell-t> set_signal_type test_asynch resetdc_shell-t> rtldrcdc_shell-t> source constraints.scr #clocks, input/output delays etc.dc_shell-t> set_scan_signal test_scan_enable \

–port scan_en –hookup pad/scan_en_pad/Zdc_shell-t> set_scan_signal test_scan_in –port [list PI1 PI2]dc_shell-t> set_scan_signal test_scan_out –port [list PO1 PO2]dc_shell-t> compile–scandc_shell-t> preview_scandc_shell-t> check_testdc_shell-t> read_pdef mydesign_floorplan.pdef

Page 185: Advanced ASIC Chip Synthesis - Bhatnagar

166 Chapter 8

dc_shell-t> insert_scan –physicaldc_shell-t> check_testdc_shell-t> write–format verilog–hierarchy–output mydesign.svdc_shell-t> write_pdef –v3.0 –output mydesign_scan.pdef

Script for ordering scan chains for an existing netlist

dc_shell-t> read_verilog mydesign.sv #Gate level netlistdc_shell-t> current_design mydesigndc_shell-t> set.scan_configuration –style multiplexed_flip_flop

–methodology full_scan \–clock_mixing no_mix \–existing_scan true \–chain_count 2

dc_shell-t> source constraints.scr #clocks, input/output delays etc.dc_shell-t> set_signal_type test_scan_enable scan_endc_shell-t> set_signal_type test_mode scan_modedc_shell-t> set_signal_type test_asynch resetdc_shell-t> set_signal_type test_scan_in [list PI1 PI2]dc_shell-t> set_signal_type test_scan_out [list PO1 PO2]dc_shell-t> check_testdc_shell-t> read_pdef mydesign_floorplan.pdefdc_shell-t> insert_scan-physicaldc_shell-t> check_testdc_shell-t> write-format verilog –hierarchy–output mydesign.svdc_shell-t> write_pdef–v3.0–output mydesign_scan.pdef

Note: The scripts provided above use DFTC only. PhyC also offers thiscapability and is superior than using just DFTC. PhyC flow isdescribed in Chapter 10.

8.3 DFT Guidelines

Obtaining high fault coverage for a design depends on the quality of theimplemented DFT logic. Not all designs are ideal. Most “real-world” designssuffer from a variety of DFT related issues, and if left unsolved, result in

Page 186: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 167

reduced fault coverage. This section identifies some of these issues andprovides solutions to overcome them.

8.3.1 Tri-State Bus Contention

This is one of the common problems faced by the DFT tool. During scanshifts, multiple drivers on a bus may drive the bus simultaneously, thuscausing contention. Fixing this problem requires that only one driver beactive at a given time. This can be achieved by adding the decoder logic inthe design, which controls the enable input of each tri-state driver through amux. The mux is used to select between the normal signal (in the functionalmode) and the control line from the decoder. The decoder control is selectedonly during the scan-mode.

The decoder inputs are generally controlled directly from the primary inputs,thus providing means to selectively turn-on the tri-state drivers, therebyavoiding contention.

8.3.2 Latches

Avoid using latches as much as possible. Although, latches cover less areathan flops, they are difficult to test. Testing maybe difficult but is not entirelyimpossible. Making them transparent during scan-mode can make themtestable. This usually means, adding control logic (for the clock) to eachlatch. If an independent clock, clocks all the latches, then a single test-logicblock may be used to control the clock to make the latches transparent duringscan-mode.

8.3.3 Gated Reset or Preset

DFT requires that the reset/preset of a flop be controllable. If the reset/presetto a flop is functionally gated in the design, then the flop is un-scannable. Toavoid this situation, the reset/preset signal should bypass the gating logic inscan-mode. A mux is generally used to remedy this problem, with theexternal scan-mode signal functioning as it’s select line; and bypassreset/preset signal along with the original gated signal, as its input.

Page 187: Advanced ASIC Chip Synthesis - Bhatnagar

168 Chapter 8

8.3.4 Gated or Generated Clocks

The gated clocks also suffer from the same issue that has been describedabove for gated resets. DFT requires that the clock input of the flop becontrollable. The solution again is to bypass the gating logic through a mux,to make the flop controllable.

This problem is prevalent in those designs that contain logic to generatedivided clocks. The flop(s) that is used to generate the divided clock shouldbe bypassed during scan-mode. The dividing logic in this case may becomeun-scannable, but the divided clock can be controlled externally, thusproviding coverage for the rest of the design. Small loss of coverage for thedividing logic is offset by the coverage gains achieved for the entire design.

In Figure 8-4, the secondary clock is controlled externally, by using a muxthat bypasses the CLK signal in the scan-mode. This provides controllabilityof the secondary clock, for the rest of the design. Depending upon the type ofdividing logic being used, some parts of the logic may be un-scannable. Thefollowing command may be used to inform DFTC to exclude a list ofsequential cells while inserting scan:

dc_shell> set_scan_element false <list of cells or designs>

Page 188: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 169

8.3.5 Use Single Edge of the Clock

Most designs are coded using a single edge of the clock as reference.However, there are always cases within a design, where both the rising andfalling edge of the clock is used. This creates a problem for DFTC, as it isunable to handle such situations. The problem may be avoided by using thesame clock edge for the entire design, when the design is in the scan-mode.This is illustrated in the following VHDL example:

process(clk, test_mode)begin

if (test_mode = ‘1’) thenmuxed_clk_output <= clk;

elsemuxed_clk_output <= not(clk);

end if;end process;

The above VHDL code infers a two-input mux. Positive edge of the clock ismade use of during scan-mode, while the falling edge of the clock is usedduring normal mode.

8.3.6 Multiple Clock Domains

It is strongly recommended that designer assigns separate scan-chains foreach clock domain. Intermixing of clock domains within a scan-chaintypically leads to timing problems. This is attributed to the differences inclock skew between different clock domains. A disadvantage to using thistechnique is that it may lead to varying lengths of scan-chains.

An alternative solution is to group all flops belonging to a common clockdomain, and connect them serially to form a single scan-chain. This requiresthe clock skew between the clock domain to be minimal. The clock sourcesshould also be accessible from outside (primary inputs), so that the timingcan be externally controlled when the device is tested at the tester.

Page 189: Advanced ASIC Chip Synthesis - Bhatnagar

170 Chapter 8

There are other solutions available to this problem. One such solution is touse clock muxing at the clock source, so that only one clock is used duringscan-mode.

8.3.7 Order Scan-Chains to Minimize Clock Skew

Presence of clock skew within a scan-chain usually causes hold-timeviolations. Some designers think that since testing is performed at slowerspeed as compared to the normal operational speed, the scan-chains cannothave any timing problems. This is a misconception. Only the setup-time isfrequency dependent, while the hold-time is frequency independent.Therefore, it is extremely important to minimize clock skews to avoid anyhold-time violations in the scan-chain.

The scan-chain may be re-ordered with flops having greater clock latencynearer to the source of the scan-chain, while the flops with less clock latencykept farther away. This helps in reducing the clock skew, thereby minimizingthe possibility of any hold-time violations.

8.3.8 Logic Un-Scannable due to Memory Element

As explained earlier, the memory itself can be tested by the use of memoryBIST circuitry. However, memory elements (e.g., RAMs) that do not havescan-chains (usually built-in) surrounding them, cause a loss of coverage forthe combinational logic present at its inputs and outputs.

Let us consider the case for a RAM that is being fed by combinational logic.This logic present at its inputs is being shadowed by the RAM, thus is un-testable. If the inputs to the memory element are not coming directly fromsequential elements, then any combinational logic present between thesequential logic and the memory element becomes un-testable. To avoid thissituation, one may bypass the RAM in scan-mode. This is achieved by short-circuiting all the inputs feeding the RAM to the outputs of the RAM, througha mux. In scan-mode, the mux enables the short-circuited path and enablesdata to bypass the RAM.

Page 190: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 171

Another problem that typically arises during scan-mode it that the outputs ofthe memory element are unknown. This typically results in the ‘unknowns’being introduced to the surrounding scan-chain, causing it to fail. Thissituation can be avoided by using the bypass method, described above. The‘unknowns’ generated by the RAM are blocked by the mux present at itsoutputs. This is because the mux is selected to bypass the RAM; it willtherefore prevent the propagation of ‘unknowns’.

Page 191: Advanced ASIC Chip Synthesis - Bhatnagar

172 Chapter 8

Page 192: Advanced ASIC Chip Synthesis - Bhatnagar

DESIGN FOR TEST 173

8.4 Chapter Summary

DFT techniques are essential to an efficient and successful testing of themanufactured device. By implementing DFT features early in the designcycle, full test coverage on the design may be achieved, thereby reducing thedebugging time normally spent at the tester after the device is fabricated.

This chapter described the basic testability techniques that are currently inuse, including a brief description of logic and memory BIST that is not yetsupported by Synopsys.

A detailed description was provided for the scan insertion DFT technique,using the DFT Compiler. Various guidelines and solutions were alsoprovided that may help the user to identify the issues and problems related tothis technique.

Page 193: Advanced ASIC Chip Synthesis - Bhatnagar

9

LINKS TO LAYOUT ANDPOST LAYOUT OPTIMIZATIONIncluding Clock Tree Insertion

Until now, a virtual wall existed between the front-end and the back-endprocesses, with the signoff to the ASIC vendor for fabrication, occurring atthe structural netlist level. The ASIC vendor was responsible forfloorplanning and routing of the design, and provided the front-end designersthe resulting delay data. However, this process was inefficient and oftenresulted in multiple exchanges of the netlist and the layout data between thedesigners and the ASIC vendor.

As we move deeper into the VDSM realm, the virtual wall between the front-end and the back-end is destined to collapse. This is because of thetremendous challenges and difficulties posed by the VDSM technologies. Inorder to overcome these difficulties, it is becoming evident that greatercontrollability and flexibility of the ASIC design flow is necessary. Thisrequires total integration between the synthesis and layout processes. Thismeans that designers are now compelled to perform their own layout. Insteadof providing the ASIC vendors with the structural netlist, they are now giventhe physical database for final fabrication.

Page 194: Advanced ASIC Chip Synthesis - Bhatnagar

176 Chapter 9

This shift in the signoff has resulted in a well-defined interface, between thesynthesis tools and Place & Route tools (referred to as layout tools, from hereonwards). Synopsys refers to this interface as Links to Layout or LTL.

This chapter describes the interface between DC and the layout tool. Almostall designs require the LTL interface to conduct the post-layout optimizations(PLO). Also, this chapter provides different strategies used for PLO.Furthermore, for successful layout, a section is devoted to clock treesynthesis, as performed by the layout tool.

Assuming that the user has synthesized and optimized a design. The designmeets all timing and area requirements. Now the question arises, “How closeare the estimated wire-load models used for pre-layout optimization, to theactual extracted data from the layout?” The only way to find this informationis to floorplan and then route the design.

With shrinking geometries, the resistance of the wires is increasing ascompared to its capacitance. This results in a large portion of the total delay(cell delay + interconnect delay) being dominated by the delays associatedwith the interconnect wires. In order to reduce this effect, designers areforced to spend an increased amount of time floorplanning the chip.Therefore, it is imperative for DC to make use of the physical information, inorder to perform further optimizations.

Using Links to Layout or LTL for short, one can exchange relevant data(e.g., timing constraints and/or placement information), to and from DC, tothe layout tool. This helps DC perform improved post-layout optimizations.It also results in reduced iterations between synthesis and layout.

Note: With the introduction of Physical Compiler (or PhyC), two designflows exist. The traditional flow (described in this chapter) and thePhyC based flow (described in Chapter 10). However, some parts ofthe traditional flow are still relevant to PhyC based flow. Therefore,in order to obtain full understanding of the entire flow, it is stronglyrecommended that the reader read this chapter before proceeding toChapter 10.

Page 195: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 177

9.1 Generating Netlist for Layout

Most layout tools accept only the Verilog or EDIF netlist format as inputs.Many users, who code the design in VHDL, generate the netlist from DC inEDIF format for layout. Although this format is universal, it does havecertain drawbacks. Primarily, the EDIF format is not easily readable;therefore modifying the netlist at a later stage to perform ECO iscumbersome. Secondly, the netlist in EDIF format is not simulatable.

So the question is, why should designers route a netlist that cannot besimulated? What happens if DC generates incorrect netlist (bad logic) due tosome bug in its EDIF translator? With EDIF, the problem will only beidentified at a much later stage, while performing LVS. Therefore, it isrecommended that designers generate the netlist from DC in Verilog format,as input to the layout tool. Furthermore, the Verilog format is easy tounderstand, which considerably simplifies the task of modifying the netlist,in case an ECO needs to be performed on the design. In addition, even if thetest-bench for a design is in the VHDL format, one can still simulate theVerilog netlist by using simulators (currently available) that are capable ofsimulating a mixture of these languages.

Before sending the netlist (of the full design or individual block) to layout, itis recommended that the following procedure be performed on the netlist tofacilitate smooth transfer of the design from DC to the layout tool.

a)b)c)d)e)f)g)

Uniquify the netlist.Simplify netlist by changing names of nets in the design.Remove unconnected ports from the entire design.Make sure that all pin names of leaf cells are visible.Check for assign and tran statements.Check for unintentional gating of clocks or resets.Check for unresolved references.

9.1.1 Uniquify

As mentioned previously, the netlist must be uniquified in DC, in order toperform clock tree synthesis during layout. This operation generates a unique

Page 196: Advanced ASIC Chip Synthesis - Bhatnagar

178 Chapter 9

module/entity definition for a sub-block that is instantiated multiple times inthe design. This may seem like an unnecessary operation that reduces thereadability of the netlist, and results in increased size of the netlist. However,physically the design is considered flat by most layout tools. In other words,the blocks referenced multiple times, although ideal in all respects, existphysically at separate locations. Furthermore, flops present inside theseblocks also need to be connected to the clock source. This makes it obviousthat separate clock-net names are required for connecting the clock tree tothese blocks.

Non-uniquified netlists pose a problem during clock tree transfer from thelayout tool to DC. The problem only occurs, if the clock tree informationalone is transferred to DC (through methods described later), that does notinvolve a complete netlist transfer from the layout tool to DC. In this case,only one module/entity definition for the multiple instanced blocks is presentin the netlist, for a non-uniquified design. This causes a problem when theclock tree information is transferred back to DC, i.e., modifying the designdatabase in DC, to include the buffers and additional ports in the sub-blocks.The problem is that two distinct net names (outputs of clock tree) cannotconnect to the same port of a single module/entity. Uniquifying the designsolves the above problem. However, it also causes the netlist to increase insize, since it creates separate module/entity definition for each instantiationof the block.

Some users prefer to uniquify the netlist as they traverse the hierarchy toreach the top-level, while others uniquify the whole chip at once, from thetop-level. The recommended approach is to remove the dont_touch attributefrom all sub-blocks of the design, before uniquifying the netlist.

The following command may be used to remove the dont_touch attributefrom the entire design, before uniquifying the netlist from the top level:

dc_shell -t> remove_attribute [get_designs –hier {*}] dont_touch

dc_shell -t> uniquify

Page 197: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 179

9.1.2 Tailoring the Netlist for Layout

Some layout tools may have difficulty reading the Verilog netlist thatcontains unusual net names. For instance, DC sometimes produces signalnames with “*cell*” or “*–return” appended, or in-between the names. Othertimes, users may find that some net names (or port names) have leading ortrailing underscores. Also, most layout tools have restrictions on themaximum number of characters for a net, or a port name. Depending on therestrictions imposed by the specific layout tool, it is possible for the user toclean the netlist within DC, before writing it out. This ability of DC providesa smooth interface to the layout tool while meeting all tool requirements.

To prevent DC from generating the undesirable signal names, the user mustfirst define rules, and then instruct DC to conform to these rules beforewriting out the netlist. For instance, one may define rules called “BORG” byincluding the following in the “.synopsys_dc.setup” file:

define_name_rules BORG –allowed {A-Za-zO-9_} \–first_restricted “0–9_\[ ]” \–max_length 30 \–map { {“*cell*”, “mycell”}, {“*–return”, “myreturnz”} }

Instructing DC to conform to the above rule (BORG) is performed at thecommand line (or through a script), by using the following command:

dc_shell-t> change_names –hierarchy –rules BORG

In addition to the above, users may also desire to alter the bus naming stylein the netlist. DC provides a variable through which the user is allowed totailor the naming style of the busses, written out in the netlist. The variablemay again be set in the setup file as follows:

set bus_naming_style {%s[%d]}

Page 198: Advanced ASIC Chip Synthesis - Bhatnagar

180 Chapter 9

9.1.3 Remove Unconnected Ports

Many designs suffer from the problem of ports of a block that are leftunconnected intentionally, or maybe due to legacy reasons. Although, thispractice has no affect on DC in terms of producing functionally correctnetlist, however, some designers prefer to remove these ports duringsynthesis. This is generally a good practice since, if left unconnected, DC willissue a warning message regarding the unconnected ports. Because a designmay contain many such unconnected ports, it is possible that a real warningmay get lost between the numerous unconnected ports warnings. It istherefore preferable to remove the unconnected ports and check the design,before generating the netlist. The following commands perform this:

dc_shell-t> remove_unconnected_ports [get_cells –hier {*}]

dc_shell-t> check_design

9.1.4 Visible Port Names

Generally, all synthesized designs result in mapped components that haveone (or more) of their output ports not connected to a net. When DCgenerates a Verilog netlist, it does not write out the unconnected port names.Depending upon the layout tool, a mismatch might occur between thenumber of ports in the physical cell versus the number of ports of the samecell present in the netlist. For example, a D flip-flop containing 4 portsnamely, D, CLK, Q and QN, may be connected as follows:

DFF dff_reg (.D(data), .CLK(clock), .Q(data_out) ) ;

In the above case, DC does not write out QN the port, since the function ofthe inverting QN output is not utilized in the design. Physically, this cellcontains all 4 ports, therefore, when the netlist is read in the layout tool, amismatch between the number of ports occurs. Setting the value of thefollowing variable to true in the setup file can prevent this mismatch:

set verilogout_show_unconnected_pins true

Page 199: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 181

Making the port names visible is solely dependent on the layout tool’srequirements. However, recently some layout tool vendors upon realizingthis limitation have improved their tools so that the above restriction is notimposed.

9.1.5 Verilog Specific Statements

Some layout tools have difficulty reading the netlist that contains tri wires,tran primitives and assign statements. These are Verilog specific primitivesand statements that are generated in the netlist for many possible reasons.

DC generates tri wires for designs containing “inout” type ports. For designscontaining these types of ports, DC needs to assign values to the bi-directional port, thus producing tri wire statement and tran primitives. Toprevent DC from generating these, users may use the following IO variablein the setup file. When set to true, all tri-state nets are declared as wireinstead of tri.

set verilogout_no_tri true

Several factors influence the generation of the assign statements.Feedthroughs in the design are considered as one such factor. Thefeedthroughs may occur if the block contains an input port that is directlyconnected to the output port of the same block. This results in DC generatingan assign statement in the Verilog netlist. Also the assign statements getgenerated, if an output port is connected to ground, or is being driven by aconstant (e.g., 1’b0 or 1’bl). While writing out the netlist in Verilog format,DC issues a warning, stating that the assign statements are being written out.

In case of the feedthroughs, the user can prevent DC from generating thesestatements by inserting a buffer between the previously connected input andoutput port. This isolates the input port from the output port, therebybreaking the feedthrough. To perform this, the following variable can be usedbefore compiling the design.

dc_shell-t> set_fix_multiple_port_nets –feedthroughs

Page 200: Advanced ASIC Chip Synthesis - Bhatnagar

182 Chapter 9

The –buffer_constants option may also be used in the above variable inorder to buffer the constants driving the output port. However, since there aremany other variations that may produce the assign statements, it may be saferto use the following for full coverage:

dc_shell-t> set_fix_multiple_port_nets –all –buffer_constants

Many designers complain that assign statements get generated in the netlist,even after all the steps described above have been performed. In almost allcases this is caused by the dont_touch attribute present on a net without theusers’ knowledge. The user can find the presence of this attribute byperforming a report_net command. The dont_touch attribute on the net canbe removed from the net by using the following command:

dc_shell-t> remove_attribute [get_nets <net name>] dont_touch

9.1.6 Unintentional Clock or Reset Gating

It is always a good idea to check and double-check the clocks in the designbefore handing the netlist over for place and route. Remember that the clockprovides the reference for all signals i.e., all signals are directly related to theclock and are optimized with respect to it. If the clock is unintentionallybuffered (maybe you forgot to apply a set_dont_touch_network attributeon it), it will affect clock latency and skew, which may result in the user notbeing able to meet the set timing objectives.

Generally, resets are not considered as important as clocks. However, sincethe set_dont_touch_network attribute is also applied for them, it is wise tocheck their buffering.

To check for unintentional gating for the clocks, the user may use thefollowing command:

dc_shell-t> report_transitive_fanout–clock_tree

To check for unintentional gating of another signal (say, the reset signal),you may use the –from option in the above command. For example:

Page 201: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 183

dc_shell-t> report_transitive_fanout -from reset

Obviously, the clocks should be defined before the –clock_tree option canbe used. Alternatively, one may also use the –from option for the clocks.This does not require the clocks to be defined first. Note that the –from andthe –clock_tree option cannot be used simultaneously.

9.1.7 Unresolved References

Designers should exercise caution and always check for any unresolvedreferences. DC issues a warning for a design containing instantiations of ablock that does not have a corresponding definition. For example, block A isthe top level module that instantiates sub-block B. If you fail to read thedefinition of block B in DC while writing out the netlist for block A, DC willgenerate a warning stating that block A contains unresolved references. Also,this message is issued for cases where a port mismatch occurs between theinstanced cell and its definition.

9.2 Layout

With a clean and optimized netlist, the user is ready to transfer the design toits physical form, using the layout tool. Although, layout is a complexprocess, it can be condensed to three basic steps, as follows:

a)b)c)

Floorplanning.Clock tree insertion.Routing the database.

9.2.1 Floorplanning

This is considered to be the most critical step within the entire layoutprocess. Primarily a design is floorplanned in order to achieve minimumpossible area, while still meeting timing requirements. Also, floorplanning isperformed to divide the design into manageable blocks.

Page 202: Advanced ASIC Chip Synthesis - Bhatnagar

184 Chapter 9

In a broad sense, floorplanning consists of placement of cells and macros(e.g., RAMs and ROMs or sub-blocks) in their proper locations. Theobjective is to reduce net RC delays and routing capacitances, therebyproducing faster designs. Placing cells and macros in proper locations alsohelps produce minimum area and decrease routing congestion.

Almost all designs undergo the floorplanning phase, and time should bespent trying to find the correct placement location of the cells. Optimalplacement improves the overall quality of the design. It also helps in reducedsynthesis-layout iterations. For small and/or slow designs the floorplanningmay not be as important, as that for large and/or timing critical designsconsisting of thousands of gates (>150K). For these designs, it isrecommended that a hierarchical placement and routing of the design beperformed. For example, a sub-block has been placed and routed, meeting alltiming and area requirements. The sub-block is subsequently brought in as afixed macro inside the full design, to be routed with the rest of the cells ormacros.

9.2.1.1 Timing Driven Placement

Finding correct locations of cells and macros is time consuming, since eachpass requires full timing analysis and verification. If the design fails timingrequirements, it is re-floorplanned. This obviously is a time consuming andoften frustrating method. To alleviate this, the layout tool vendors haveintroduced the concept of timing-driven-placement, more commonly referredto as timing-driven-layout (TDL).

The TDL method consists of forward annotating the timing information ofthe design generated by DC, to the layout tool. When using this method, thephysical placement of cells is dictated by the timing constraints. The layouttools gives priority to timing while placing the cells, and tries not to violatethe path constraints.

DC generates the timing constraints in SDF format using the followingcommand:

Page 203: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 185

write_constraints –format <sdf | sdf-v2.1>–cover_design–from <from list>–to <to list>–through <through list>–output <output file name>

The above command generates the constraints file in SDF format. Bothversions 1.0 and 2.1 are supported. If the layout tool does not support the 2.1version, then the user may always use the default version 1.0 by specifying“sdf” instead of “sdf-v2.1”.

The write_constraints command provides many more options in addition tothe one illustrated in the above example, however, the use of –cover_designoption is more prevalent. The –cover_design option instructs DC to outputjust enough timing constraints so as to cover the worst path through everydriver-load pin pair in the design. For additional information regarding thiscommand and its options, the user is advised to refer to the DC referencemanual.

A timing constraint file in SDF version 2.1 format generated by DC with the–cover_design option, is illustrated in Example 9.1. The SDF file containsthe TIMINGCHECK field containing PATHCONSTRAINT for all the pathsin a design. The last field of the PATHCONSTRAINT timingcheck containsthree sets of numbers that define the path delay for a particular path segment.The three numbers, although the same in this example, correspond tominimum, typical, and maximum delay values. These numbers and theircorresponding paths govern the placement of cells during layout.

Example 9.1

(DELAYFILE(SDFVERSION "OVI 2.1")(DESIGN "hello")(DATE "Mon Jul 20 22:59:49 1998")(VENDOR "Enterprise")(PROGRAM "Synopsys Design Compiler cmos")(VERSION "1998.02-2")

Page 204: Advanced ASIC Chip Synthesis - Bhatnagar

186 Chapter 9

(DIVIDER/)(VOLTAGE 2.70:2.70:2.70)(PROCESS "TYPICAL")(TEMPERATURE 95.00:95.00:95.00)(TIMESCALE 1ns)(CELL(CELLTYPE "hello")(INSTANCE)(TMINGCHECK(PATHCONSTRAINT INPUT1 U751/A3 U751/ZN U754/I1

U754/ZN REG0/D (1.523:1.523:1.523) )(PATHCONSTRAINT INPUT2 U744/A1 U744/Z U745/A1

U745/ZN REG1/D (1.594:1.594:1.594))(PATHCONSTRAINT REG1/CLKREG1/Q U737/I U737/ZN

OUTPUT1 ( 3.000:3.000:3.000) )(PATHCONSTRAINT REG2/CLK REG2/Q U1131/A2

U1 131/ZN REG3/D (25.523:25.523:25.523) )

It must be noted that depending upon the size of the design, the generation ofthe timing constraints for the entire design may take a considerable amountof time. Constraints may be generated for selected timing-critical paths(using –from, –to and–through options) in order to avoid this problem.Alternatively, users may perform hierarchical place and route, where smallsub-blocks are routed first, utilizing the TDL method. Hierarchical place androute is a preferred approach, since it is based upon the “divide and conquer”technique. Dividing the chip into small manageable blocks makes itrelatively simpler for designers to tackle the run-time problems.

An alternative approach of performing TDL is to let the layout tool generatethe timing constraints based upon the boundary conditions, top-levelconstraints and timing exceptions of the design. This is a tool dependentfeature and supported by some layout tool vendors, it may not be supportedby others. The layout tool uses its own delay calculator to find out the timingconstraints for each path in the design in order to place cells. This method isfar superior than the others described previously in the sense that this methodis considerably faster, however, a major drawback with this approach is that

Page 205: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 187

users are now compelled to use and trust the delay calculator of the layouttool. In any case, timing convergence can be achieved with relative ease,using this approach.

Performing TDL may also have an impact on the overall area. One may findthat the area increases when the above approach is used. However, this pointis debatable with some users insisting that the total area gets reduced becauseof the rubber-band effect caused by the TDL method, while others swear bythe opposite.

9.2.1.2 Back Annotation of Floorplan Information

Total integration with the back-end tools allows DC to perform withincreased efficiency, in order to achieve timing and area convergence. DCmakes use of several formats that allow the layout information to be read byDC. For post-layout optimization, it is necessary for DC to know the physicallocation of each sub-block. Using the physical design exchange format(PDEF) grants DC access to this pertinent information. The PDEF filecontains the cluster (physical grouping) information and location of cells inthe layout.

Pre-placement, the netlist is optimized using the wire-load models, spreadacross the logical hierarchy. However, physical hierarchy may be differentthan the logical hierarchy. Physically, the cells/macros may be groupeddepending on the pad locations or some other consideration. Therefore, it isimperative for DC to receive the physical placement information, for it tomore effectively optimize the design. This is done by re-adjusting theapplication of the wire-loads on the design, based upon the physicalhierarchy.

DC uses the following command to read the physical placement informationgenerated by the layout tool in PDEF format:

read_clusters –design <designname> <pdef filename>

Page 206: Advanced ASIC Chip Synthesis - Bhatnagar

188 Chapter 9

Once the netlist has been re-optimized, the physical information may bepassed back to the layout tool through the PDEF file. The followingcommand in DC, performs this task:

write_clusters –design <design name> –output <pdef filename>

9.2.1.3 Recommendations

a)

b)

c)

d)

In general TDL performs well on all types of designs. However,definitely use TDL for timing critical, and/or high-speed designs, in orderto minimize synthesis-layout iterations and achieve timing convergence.

When handling large designs, generate timing constraints only forselected nets. This will save you a considerable amount of time. However,if your layout tool is capable of generating its own timing constraints,then it should be given preference over the other approach, in order tosave time.

Perform hierarchical place and route for large designs. Although tedious,it will generally provide you with best results as well as better control ofthe overall flow. Hierarchical place and route also expedites hand editingof netlist that is sometimes required after routing is completed.

Always use physical placement information in PDEF format whileperforming post-layout optimization within DC, especially for largehierarchical designs.

9.2.2 Clock Tree Insertion

As explained in previous chapters, it is essential to control the clock latencyand skew. Although, some designs may actually take advantage of thepositive skew to reduce power, most designs however, require minimal clockskew and clock latency. Larger values of clock skew cause race conditionsthat increase the chance of wrong data being clocked in the flops. Controllingthe skew and latency requires a lot of effort and foresight.

Page 207: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 189

As mentioned before, the layout tool performs the clock tree synthesis (CTSfor short). The CTS is performed immediately after the placement of thecells, and before routing these cells. With input from the designer, the layouttool determines the best placement and style of the clock tree. Generally,designers are asked for the number of levels along with the types of buffersused for each level of the clock tree. Obviously, the number of levels isdependent on the fanout of the clock signal.

In a broad sense, the number of levels of the clock tree is inverselyproportional to the drive strength of the gates used in the clock tree. In otherwords, you will need more levels, if low drive strength gates are used, whilethe number of levels is reduced if high drive strength gates are used.

To minimize the clock skew and clock latency, designers may find thefollowing recommendations helpful. It must be noted that theserecommendations are not hard and fast rules. Designers often resort to usinga mixture of techniques to solve the clocking issues.

a)

b)

c)

d)

Use a balanced clock tree structure with minimum number of levelspossible. Try not to go overboard with the number of levels. The more thelevels, the greater the clock latency.

Use high drive strength buffers in large clock trees. This also helps inreducing the number of levels.

In order to reduce clock skew between different clock domains, trybalancing the number of levels and types of gates used in each clock tree.For instance, if one clock is driving 50 flops while the other clock isdriving 500 flops, then use low drive strength gates in the clock tree ofthe first clock, and high drive strength gates for the other. The idea here isto speed-up the clock driving 500 flops, and slow down the clock that isdriving 50 flops, in order to match the delay between the two clock trees.

If your library contains balanced rise and fall buffers, you may prefer touse these instead. Remember, in general it is not always true that thebalanced rise and fall buffers, are faster (less cell delay) than the normalbuffers. Some libraries provide buffers that have lower cell delays for risetimes of signals, as compared to the fall times. For designs utilizing the

Page 208: Advanced ASIC Chip Synthesis - Bhatnagar

190 Chapter 9

positive edge trigger flops, these buffers may be an ideal choice. The ideais to study the library and choose the most appropriate gate available. Pastexperience also comes in handy.

e)

f)

g)

To reduce clock latency, you may try to use high drive inverters for twolevels. This is because, logically a single buffer cell consists of twoinverters connected together, and therefore has an cell delay of twoinverters. Using two separate inverters (two levels) will achieve the samefunction, but will result in reduced overall cell delay – since you are notusing another buffer (2 more inverters) for the second level. Use thisapproach, only for designs that do not contain gated clocks. The reasonfor this explained later (point h).

Do not restrict yourself to using the same type and drive strength gate forCTS. Current layout tools allow you to mix and match.

For a balanced clock tree (e.g., 3 levels), the first level is generally asingle buffer driven by the Pad. In order to reduce clock skew, the firstlevel buffer is placed near the center of the chip, so that it can connect tothe next level of buffers, through equal interconnect wires. This creates aring like structure with the first buffer in the center, with the second set ofbuffers (second level) surrounding it, and the last stage surrounding thesecond level. Thus, the distance between the first, second and the thirdlevel are kept at minimum. However, although a good arrangement, itdoes result in the first level buffer being placed farthest from the source(Pad). If a minimum size wire is used to route the clock network from thePad source to the first buffer, it will result in a large RC delay that willaffect the clock latency. Therefore, it is necessary to size-up (widen) thiswire from the Pad source to the input of the buffer (first level), in order toreduce the resistance of the wire, thereby reducing the overall latency.Depending upon the size of your design and the number of levels, youmay also need to perform this operation on other levels.

h) In order to minimize the skew, the layout tool should have the ability totap the clock signal, from any level of the clock tree. This is especiallyimportant for designs that contain gated clocks. If the same clock is usedfor other ungated flops, then it results in additional delay, hence the skew.If the clock tree ended at the gate, the additional delay will cause a large

Page 209: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIM1ZA TION 191

skew between the gated-clock flop and the ungated-clock flop as shownin Figure 9-1 (a). Therefore it is necessary to tap the clock source from alevel up for the gated-clock flop, while maintaining the full clock tree forthe ungated clock flop, as illustrated in Figure 9-1 (b). However, ifinverters are used in the clock tree (point e), then the above approachbreaks down. In this case, do not use inverters as part of the clock tree.

Page 210: Advanced ASIC Chip Synthesis - Bhatnagar

192 Chapter 9

9.2.3 Transfer of Clock Tree to Design Compiler

Clock tree synthesis done by the layout tool modifies the physical design(cells are added in the clock network). This modification is absent from theoriginal netlist present in DC. Therefore, it is necessary for the user toaccurately transfer this information to DC. There are several ways to do this.

a) Generally all layout tools have the capability to write out the design inEDIF or Verilog format. Since, everything may appear flat to the layouttool, designers may receive a flat netlist from the layout tool. Of course,this netlist will contain the clock tree information, but the enormous sizeof the netlist itself may be daunting and unmanageable. Furthermore, dueto the absence of original design hierarchy, the flat netlist is not easilyreadable. Another problem with this approach is that the user is nowforced to designate this netlist as the “golden” netlist, meaning that allverification (LVS etc.) has to be performed against this netlist. Doing thisis comparable to “digging your own grave” because, if the layout toolbotches the layout, the same anomalies will be reflected in the netlist. Ofcourse, the LVS will pass without flagging any errors, since the user ischecking physical layout data against layout generated netlist i.e.,performing LVL instead of LVS. An alternative is to perform formalverification between the original hierarchical netlist and the layoutgenerated flat netlist. This certainly is a viable approach, but has its ownlimitations regarding the size and complexity of the design. Most formalverification tools suffer from this limitation, i.e., they excel in individualblock verification, but fall short in full chip verification. This is especiallytrue for verifying flat netlists against hierarchical netlists.

b) The second approach is to only transfer, point-to-point clock delayinformation, starting from the clock source to its endpoints (clock pins ofthe flops). The delay calculator of the layout tool will perform this task,and upon instruction will provide the designer, the point-to-point timinginformation of the clock tree in SDF format. Designers may back annotatethis SDF file to the original design in order to determine the clock latencyand skew. This method does not require the clock tree to be transferred toDC from the layout tool. However, this approach has its own pitfalls.Primarily, this approach does not allow the usage of SPF data from thelayout for back annotation to PT. Furthermore, the designer is now

Page 211: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 193

c)

d)

compelled to trust the delay calculator of the layout tool i.e., anothervariable has been introduced that requires qualification. Since, the layoutlibraries are separate from Synopsys libraries, in order to get the samedelay numbers, the timing numbers present in the Synopsys library needto match exactly to that of the layout library. The dilemma of verifyingthe original netlist against the layout database still exists, especially sincethe original netlist does not contain the extra cells and nets due to clocktree insertion. However, one can certainly find work-around and may usethis approach successfully.

A solution to all of the above problems is to creatively transfer the entireclock tree to DC without changing the hierarchy of the design. Somelayout tools may even generate Synopsys scripts that containdc_shell-t commands like, disconnect_net, create_cell,create_port and connect_net. These commands on execution insert theclock tree into the original design database in DC, while still maintainingthe hierarchy. Of course, one needs to verify the resulting modified netlistagainst the original netlist by performing formal verification. Since thedesign hierarchy is not altered, the formal verification runs smoothly.

Another solution involves brute force modification. Generally the layouttools, upon completion of CTS, produce a summary report of all changesmade to the design. One may take advantage of this report and parse it toretrieve the relevant information (e.g., name of clock tree insertion points,type and name of buffers etc.) using scripting languages like Perl or Awk.Once the information is gathered, the original Verilog netlist may bedirectly modified without going through DC. The modified netlist shouldbe read back into DC to check for any syntax errors. In addition, themodified netlist should also be formally verified against the originalnetlist.

Recently, upon realizing this problem the layout tool vendors have facilitatedthis process by generating the hierarchical netlist from the layout database.This netlist contains the clock tree information and should be verifiedformally against the original netlist. Upon successful verification, the netlistmay be declared as “golden”.

Page 212: Advanced ASIC Chip Synthesis - Bhatnagar

194 Chapter 9

9.2.4 Routing

After the clock tree insertion, the final step involves routing the chip. In abroad sense, the routing is divided into two phases:

1.2.

Global Routing, andDetailed Routing.

The first routing phase is called the global route, in which the global routerassigns a general pathway through the layout for each net. During globalroute, the layout surface is divided into several regions. The global routerdecides the shortest route through each region in the layout surface, withoutlaying the geometric wires.

The second routing phase is called the detailed route. The detailed routermakes use of the information gathered by the global route and routes thegeometric wires within each region of the layout surface.

It must be noted that if the run-time of global route is long (more than theplacement run-time), it indicates a bad placement quality. In this case, theplacement should be performed again with emphasis on reduced congestion.

9.2.5 Extraction

Until now, synthesis and optimization was performed utilizing the wire-loadmodels. The wire-load models are based on statistically estimating the finalrouting capacitances. Because of the statistical nature of wire-load models,they may be completely inaccurate compared to the real delay values of therouted design. This variation between the wire-load models and the realdelay values results in an non-optimized design.

The layout database is extracted to produce the delay values necessary tofurther optimize the design. These values are back annotated to PT for statictiming analysis, and to DC for further optimization and refinement of thedesign.

Page 213: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 195

9.2.5.1 What to Extract?

In general, almost all layout tools are capable of extracting the layoutdatabase using various algorithms. These algorithms define the granularityand the accuracy of the extracted values. Depending upon the chosenalgorithm and the desired accuracy, the following types of information maybe extracted:

1.2.3.4.

Detailed parasitics in DSPF or SPEF format.Reduced parasitics in RSPF or SPEF format.Net and cell delays in SDF format.Net delay in SDF format + lumped parasitic capacitances.

The DSPF (Detailed Standard Parasitic Format) contains RC information ofeach segment (multiple R’s and C’s) of the routed netlist. This is the mostaccurate form of extraction. However, due to long extraction times on a fulldesign, this method is not practical. This type of extraction is usually limitedto critical nets and clock trees of the design.

The RSPF (Reduced Standard Parasitic Format) represents RC delays interms of a pi model (2 C’s and 1 R). The accuracy of this model is less thanthat of DSPF, since it does not account for multiple R’s and C’s associatedwith each segment of the net. Again, the extraction time may be significant,thus limiting the usage of this type of information. Target applications arecritical nets and small blocks of the design.

Both detailed and reduced parasitics can be represented by OVI’s (OpenVerilog International) Standard Parasitic Exchange Format (SPEF).

The last two (number 3 and 4) are the most common types of extraction usedby the designers. Both utilize the SDF format. However, there is majordifference between the two. Number 3 uses the SDF to represent both the celland net delays, whereas number 4 uses the SDF to represent only the netdelays. The lumped parasitic capacitances are generated separately. Somelayout tools generate the lumped parasitic capacitances in the Synopsysset_load format, thus facilitating direct back annotation to DC or PT.

Page 214: Advanced ASIC Chip Synthesis - Bhatnagar

196 Chapter 9

It is worth mentioning that PT can read all five formats (DSPF, RSPF, SPEF,SDF and set_load), whereas, DC can only read the SDF and set_load fileformats. The SDF and set_load file formats are not as accurate as the DSPFor RSPF types of extraction, however, the time to extract the layout databaseis significantly reduced. For most designs this type of extraction providessufficient accuracy and precision. However, as suggested, only critical netsand clocks in the design should be targeted for DSPF or RSPF types ofextraction.

For the layout tool to generate a full SDF (number 3 approach), it uses itsown delay calculator to compute the cell delays that are based upon theoutput loading and the transition time of the input signal. However, there is aflaw in using this approach. The synthesis was done using DC that used itsown delay calculator to optimize the design. By choosing to use the full SDFgenerated by the layout tool, we are now introducing another variable thatneeds qualification. How do we know that the delay calculator used by thelayout tool is more accurate than the one used by DC? Also, upon backannotation of the full SDF to PT, the full capability of PT is also not beingutilized. This is because the cell delays are already fixed in the SDF file, andperforming case analysis in PT will not yield accurate results, even if theconditional delays are present in the SDF file. This is discussed at length inChapter11.

Another problem exists with the above approach. Since only the cell and netdelays are back annotated, DC does not know the parasitic capacitancesassociated with each net of the design. Therefore, when performing post-layout optimization, DC can only make use of the wire-load models to makeincremental changes to the design, thus defeating the whole purpose of backannotation. However, if the fourth approach was used (net delays in SDFformat + lumped parasitic capacitances), DC makes use of the net loadinginformation during post-layout optimization (e.g., to size up/down gates).

To avoid these problems, it is recommended that only the net RC delays (alsocalled as interconnect wiring delays) and lumped parasitic capacitances areextracted from the layout database. Upon back annotation, DC or PT uses itsown delay calculator to compute the cell delays, based upon the backannotated interconnect RC’s and capacitive net loading.

Page 215: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 197

To summarize, it is recommended that the following types of informationshould be generated from the layout tool for back annotation to DC in orderto perform post layout optimization:

a)b)

Net RC delays in SDF format.Capacitive net loading values in set_load format.

For static timing analysis, using PT, the following types of information canbe generated:

a)b)c)

Net RC delays in SDF format.Capacitive net loading values in set_load format.Parasitic information for clock and other critical nets in DSPF, RSPF orSPEF file formats.

9.2.5.2 Estimated Parasitic Extraction

The extraction of parasitics at the pre-route level (after global routing)provides a closer approximation to the parasitic values of the final routeddesign. If the estimates indicate a timing problem, it is fairly easy to quicklyre-floorplan the design before starting the detailed route. This methodreduces synthesis-layout iterations and avoids wastage of valuable time.

The difference between the estimated extracted delay values after the globalrouting and the real delay values after the detailed routing is minimal. Incontrast, the estimated delay values between the floorplan extraction anddetailed route extraction may be significant. Therefore it is prudent that afterfloorplanning, cell placement and clock tree insertion, the design be globallyrouted, before extracting the estimated delay numbers.

A complete extraction flow is shown in Figure 9-2. If major timing violationsexist after global route, it may be necessary to re-optimize the design withinDC with estimated delays back annotated. However, if the timing violationsare not severe then re-floorplanning (and/or re-placement of cells) the designmay achieve the desired result. The detailed routing should be performed,only after the eliminating all timing violations produced after the globalrouting phase.

Page 216: Advanced ASIC Chip Synthesis - Bhatnagar

198 Chapter 9

Page 217: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 199

9.2.5.3 Real Parasitic Extraction

A full extraction (actual values – no estimation) is performed after the designhas been satisfactorily globally routed, i.e., no DRC violations andachievement of required die-size.

This by far is the most critical part of the entire process. The final productmay not work, if the extracted values are not accurate. With technologiesshrinking to 0.18 micron and below, the extraction algorithm of the layouttool need to take into account the second and third order parasitic effects.Any slight deviations of these values may cause the design to fail.

Consider a case where the extracted values are too pessimistic. Static timinganalysis indicates that the signals are meeting hold-time requirements.However, in reality, the signals are arriving faster, causing real hold-timeviolations, but due to pessimistic back annotated parasitic capacitances, thedesign is passing static timing. The case for setup-time is similar, if theextracted values are too optimistic.

9.3 Post-Layout Optimization

Post-layout optimization is performed to further optimize and refine thedesign. The process involves back annotating the data generated by thelayout tool, to the design residing in DC. Depending upon the severity ofviolations, the optimizations may include full synthesis or minor adjustmentsthrough the use of in-place-optimization (IPO) technique. As explained in theprevious section, the layout related data suitable for back annotation to DCare:

a)b)c)

Net RC delays in SDF format.The set_load file, containing capacitive net loading.Physical placement information in PDEF format.

Page 218: Advanced ASIC Chip Synthesis - Bhatnagar

200 Chapter 9

9.3.1 Back Annotation and Custom Wire Loads

The next step involves analyzing the static timing of the design. Designersmay choose to perform this step using PT or DC’s internal static timinganalysis engine. In any case, post layout optimization can only be performedwithin DC therefore the layout data needs to be back annotated to both DCand PT.

Depending on the process technology, the layout tool may generate twoseparate files that correspond to the worst and the best case. If there are twoseparate Synopsys libraries pertaining to each case, then back annotate theworst case layout data to the design using the worst case Synopsys library.Similarly, best case layout data should be back annotated to the designmapped to best case Synopsys library.

Some vendors provide only one Synopsys library that covers all cases, i.e.,the library is characterized for TYPICAL case, with the WORST and theBEST case values derived (derated) from the TYPICAL case. In a situationlike this, it is recommended that the designer back annotate the worst casenumbers to the design with operating conditions set to WORST, in order toperform the worst case timing analysis. The best case timing analysis shouldbe performed with best case timing numbers back annotated to the designwith operating conditions set to BEST.

Use the following dc_shell-t commands to back annotate layout-generated information to the design present in DC, before performing post-layout optimization.

dc_shell -t> current_design <design name>

dc_shell -t> source <set_load file name>

dc_shell -t> read_sdf <RC file name in SDF format>

dc_shell -t> read_clusters <cluster file name in PDEF format>

Use the following pt_shell commands to back annotate layout-generatedinformation to the design in PT, before performing static timing analysis.

Page 219: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 201

pt_shell> current_design <design name>

pt_shell> source <set_load file name in PT format>

pt_shell> read_sdf <RC file name in SDF format>

pt_shell> read_parasitics <DSPF, RSPF or SPEF file name>

After back annotation in PT, if the design fails static timing with substantialamount of violations, the user may need to perform re-synthesis (or even re-code certain blocks). Therefore, it is prudent to use the existing layoutinformation for re-synthesis. Discarding the layout data during re-synthesisonly wastes the time and effort spent for layout. Furthermore, the layout datais helpful in fine tuning the design. To achieve the maximum benefit, customwire-load models should be generated through DC, using the existing layoutinformation. The resulting gate level netlist using the custom wire-loadmodels provide a closer match to the post-layout timing results. Use thefollowing dc_shell -t command to create custom wire-load models:

create_wire_load –design <design name>–cluster <cluster name>–trim <trim value>–percentile <percentile value>–output <output file name>

Although, there are other options available for the above command, generallythe ones listed above suffice for most designs. The trim value is used todiscard data that falls below a certain value, while the percentile value isused to calculate the average value. By altering the percentile value, one mayadd optimism or pessimism in the custom wire-load models. The clustername is obviously the grouping name that was used during layout, to groupcells or blocks together.

After the creation of the custom wire-load models (CWLM), the libraryshould be updated to account for the new CWLMs. This is because theoriginal technology library contains only the generic wire-load models thatare not particular to a specific design. To use the CWLMs that were

Page 220: Advanced ASIC Chip Synthesis - Bhatnagar

202 Chapter 9

generated by the above command, the library must be updated. The followingcommand may be used to update the library present in DC memory:

update_lib <library name> <CWLM file name>

It must be noted that the above command does not alter or overwrite thesource library. It only updates the DC memory to include the new CWLMs.

9.3.2 In-Place Optimization

For designs with minor timing violations after layout, there is no need toperform a full chip synthesis. In-place optimization or IPO is an excellentmethod to fine-tune a design, in order to eliminate these violations. Theconcept of IPO is to keep the structure of the design intact while modifyingonly the failing parts of the design, thereby having a minimal impact on theexisting layout. IPO is commonly used to add/swap gates at specificlocations to fix setup and/or hold-time problems.

The IPO is library dependent and can be limited to perform only thefollowing:

a)b)

Resize cells.Insert or delete existing cells (mainly buffers).

Usually, all Synopsys technology libraries have an attribute defined thatenables or disables the IPO. The attribute and its value, that enables IPO in alibrary is:

in_place_swap_mode : match_footprint

Along with the above library level attribute, all cells in the Synopsys libraryalso have the cell_footprint information. For example, two cells with samefunctionality, but different drive strengths may have the same cell_footprintvalue. This means that the two cells have identical physical coverage area,therefore replacing one for the other will not impact the existing layout, i.e.,adjacent cells will not shift. However, this restriction, along with cell sizing,

Page 221: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 203

area optimization and buffer insertion, is controlled by the use of thefollowing variables:

set compile_ignore_footprint_during_inplace_opt true | falseset compile_ok_to_buffer_during_inplace_opt true | falseset compile_ignore_area_during_inplace_opt true | falseset compile_disable_area_opt_during_inplace_opt true | false

Using these variables allows the designer the ability control the amount ofchanges made in the design. The appropriate values of the above variablesmay be set before performing IPO at dc_shell -t command line; or in theSynopsys setup file.

The IPO is invoked by using the following commands:

dc_shell -t> compile –in_place

dc_shell-t> reoptimize_design –in_place

Both commands are similar in nearly all respects, i.e., both make use of theback annotated layout information, with the exception of physicalinformation. The reoptimize_design makes use of the physical locationinformation while re-optimizing the design. Another difference between thetwo commands is that the “compile –in_place” command uses the librarywire-load models during IPO, whereas the “reoptimize_design –in_place”command makes use of the custom wire-load models. Therefore, it isimperative, that the latter command be used while performing IPO.

It must be noted that the reoptimize_design when used on its own makesmajor modifications to the design. To eliminate this possibility, always usethe –in_place option.

9.3.3 Location Based Optimization

Location Based Optimization or LBO is an integral part of IPO, and isinvoked automatically while performing IPO for designs containing backannotated physical placement location information in PDEF format. In this

Page 222: Advanced ASIC Chip Synthesis - Bhatnagar

204 Chapter 9

chapter the LBO is being identified separately from IPO due to itsimportance and additional capability.

Performing IPO with LBO improves the overall optimization of the design,since DC now has access to the cell placement information. This allows DCto apply more powerful algorithms during optimization.

Consider a path segment starting from primary input and ending at a flop.Post-layout timing analysis reveals a hold-time problem for this path. Asshown in Figure 9-3, LBO optimization will add buffer(s) near the flop(endpoint), instead of adding it at the source (startpoint). Adding buffer(s) atthe source may cause a setup-time failure for another path originating fromthe same source.

In addition to the enhanced capability of inserting buffers at optimallocations, LBO also provides better modeling of cross cluster nets, and the

Page 223: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 205

new nets that were created to connect the inserted or deleted buffers. This isdue to the fact that, DC is aware of the location where the cells were insertedor deleted.

In order for the “reoptimize_design –in_place” to perform LBO, thefollowing variables need to be set to true; in addition to the IPO relatedvariable “compile_ok_to_buffer_during_inplace_opt”:

set lbo_buffer_removal_enabled true

set lbo_buffer_insertion_enabled true

LBO is not enabled for buffer insertion or removal, if the value of abovevariables is set to false (default). Location information is disregarded for thiscase, with only IPO algorithms used to handle the buffer insertion ordeletion.

The changes made by performing IPO and/or LBO, using both “compile–in_place” or “reoptimize_design –in_place”, can be written out to a fileby using the following dc_shell -t variable:

set reoptimize_design_changed_list_file_name <file name>

If the file already exists, the new set of changes will be appended to the samefile.

9.3.4 Fixing Hold-Time Violations

Nearly every design undergoes the process of fixing hold-time violations,especially for faster technologies. Designers tackle this problem usingvarious approaches. For this reason, a separate section is devoted to discussissues arising from using these methods, and to consolidate them under onetopic.

Most designers synthesize the design with tight constraints in order tomaximize the setup-time. The resulting effect is a fast logic with dataarriving faster at the input of the flop, with respect to the clock. This may

Page 224: Advanced ASIC Chip Synthesis - Bhatnagar

206 Chapter 9

result in hold-time violations due to data changing value before being latchedby the flop. Generally, designers prefer to fix the hold-time violations afterinitial placement and routing of the design, thereby making use of moreaccurate delay numbers.

Removing hold-time violations involves delaying the data with respect to theclock, so that the data does not change for a specified amount of time (hold-time) after the arrival of the clock edge. There are several methods utilizedby designers to insert the appropriate delays, as outlined below:

a)b)c)

Using Synopsys methodology.Inserting delays manually.Inserting delays automatically, by using brute force dc_shell -tcommands.

9.3.4.1 Synopsys Methodology

Synopsys provides the following dc_shell-t command, which enablesthe compile command to fix the hold-time violations:

set_fix_hold <clock name>

dc_shell -t>set_fix_hold CLK

The above command may be used during initial compile (or post-layout), bysetting the min/max library concurrently (version DC98 onwards), andspecifying the min/max values for set_input_delay command. The ideabehind setting the min/max library at the same time is to eliminate the two-pass (initial synthesis for maximum setup-time and re-optimization to fixhold-time violations) synthesis needed for almost all designs. Example 9.2illustrates the methodology of fixing the post-layout hold-time violationsusing the single pass synthesis approach.

Page 225: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 207

Example 9.2

set_min_library “<worst case library name>” \–min_version “<best case library name>”

set_operating_conditions –min BEST –max WORST

source net_delay.set_loadread_timing interconnect.sdfread_clusters floorplan.pdef

set_input_delay –max 20.0 –clock CLK [list IN1 IN2]set_input_delay –min –1.0 –clock CLK [list IN1 IN2]

set_output_delay –max 10–clock CLK [alt_outputs]

set_fix_hold CLK

reoptimize_design –in_place

Alternatively, the design may be compiled with maximum setup-time usingthe worst case library, followed by re-optimization after layout, in order tofix the hold-time violations by mapping the design to the best case library.Although, this method uses the two-pass synthesis approach, it is stillrecommended because of its stable nature. Most designers prefer to use thisapproach because of the time and effort that has been invested in definingand maturing this methodology.

It must be noted that the above command is independent of IPO commands.The IPO commands are generally used to fix hold-time violations after initiallayout with layout information back annotated to the design. Fixing pre-layout hold-time violations is accomplished by compiling the designincrementally, using the “compile –incremental” command.

The set_fix_hold command instructs DC to fix the hold-time violations byinserting the buffers at appropriate locations. This again is controlled by theIPO related variables described previously. With buffer insertion disabled,

Page 226: Advanced ASIC Chip Synthesis - Bhatnagar

208 Chapter 9

the cells in the data path can only be swapped (i.e., replacing a higher drivestrength gate with a lower drive strength gate) to increase the cell delay,thereby delaying the data arriving at the flop input.

9.3.4.2 Manual Insertion of Delays

If the timing analysis reveals a very small number of hold-time violations(less than 10 to 20 places), it may not be worthwhile to fix these violationsusing the set_fix_hold command. The delays in this case may be manuallyinserted in the netlist. The designer may chain a string of buffers to delay thedata, with respect to the clock just enough, so that it passes the hold-timechecks.

A point to note however is that a chain of buffers may not be able to provideadequate delay, since the delay is dependent on the placement of the buffersin the layout. Generally, the buffers will be placed very close to each othertherefore the total delay will be the sum of the delay through each cell. Theinterconnect delay itself will be insignificant, due to the close proximity ofthe placed cells. To overcome this, it is recommended to link a number ofhigh-fanin gates (e.g., 8 input AND gate) with all inputs tied together, andconnected to form a chain. The advantage of using this approach is that nowthe input pin capacitance of the high fanin gates is being utilized. The inputpin capacitance of a high fanin gate is usually much larger than a single inputbuffer. Therefore, this method of delaying the data provides a solution that isindependent of the cell placement location in the layout. This approach issuitable if the technology library does not contain delay cells. If these cellsare present, then they should be targeted to fix the hold-time violations.

9.3.4.3 Brute Force Method

This is a unique method, but requires expertise in scripting languages likePerl or Awk.

For instance, if the timing report shows many hold-time violations, andfixing them through Synopsys methodology means large run times. In thiscase, an alternative approach, to find the amount of slack (setup-time

Page 227: Advanced ASIC Chip Synthesis - Bhatnagar

LINKS TO LAYOUT AND POST LAYOUT OPTIMIZATION 209

analysis) and the corresponding violation (hold-time analysis) of the failingpaths, is to parse the timing report (both for worst-case and best case) using ascripting language. Using these numbers, the user may generatedc_shell-t commands like, disconnect_net, create_cell andconnect_net, for the failing paths. Upon execution of these commands indc_shell -t, it will force DC to insert and connect buffers at appropriateplaces (should be done near the endpoints of the failing paths). This is calleda brute force method, done automatically.

This by no means is a clean approach, but works remarkably well. The timetaken to fix hold-time violations using this approach is negligible ascompared to the Synopsys methodology.

9.4 Chapter Summary

Links to layout is an important part of the integration between the layout tooland DC. This chapter focussed on all aspects of exchanging data to and fromlayout tools, in order for DC to perform better optimization and fine-tuningthe design.

Issues related to transfer of clock tree information from the layout tool to DCwere explained in detail. Cross checking the netlist generated by the layouttool against the original netlist remains a major bottleneck. Variousalternatives were provided to the user in order to overcome this issue andchoose the right solution.

Starting from how to generate a clean netlist from DC in order to minimizelayout problems, this chapter covered placement and floorplanning, clocktree insertion, routing, extraction, and post-layout optimization techniques,including various methods to fix the hold-time violations. At each step,recommendations are provided to facilitate the user in choosing the rightdirection.

Page 228: Advanced ASIC Chip Synthesis - Bhatnagar

10

PHYSICAL SYNTHESIS

Time-to-market is rapidly shrinking while design complexities areincreasing. The problem is further aggravated by shrinking geometries,forcing ASIC designers to think about power and cross-talk along withtiming, much earlier in the design cycle. The exchange of data betweenlayout tools and DC (at PT) is certainly not efficient. Time wasted duringsynthesis-layout iterations is still a major bottleneck.

The main cause of synthesis-layout iterations can be attributed to thetraditional synthesis approach of relying on wire-load models to synthesizethe design. The wire-load models are just estimates of the final routed design.They may differ considerably from the real extracted delays of the layoutsurface. Going back and forth from layout to synthesis solves this problem,however at the expense of time.

In order to alleviate this problem, Synopsys recently introduced a novelapproach of synthesizing the design without the need for wire-load models.This new tool is called Physical Compiler (or PhyC) and it performssynthesis along with concurrent placement, based on the floorplaninformation. Combining synthesis and placement provides an accuratemodeling of actual interconnect delays during synthesis. In addition this tool

Page 229: Advanced ASIC Chip Synthesis - Bhatnagar

212 Chapter 10

also minimizes the previous headache of passing data back and forth fromthe layout tool to the synthesis tool.

PhyC is a superset of DC and incorporates all commands of DC along withsome of its own. It is invoked by typing: psyn_shell

10.1 Initial Setup

PhyC uses the same setup file as DC — .synopsys_dc.setup. The onlydifference being that in addition to logical libraries it also requires inclusionof physical libraries in the same file. A complete description of the syntaxand usage is provided in Chapter 3.

10.1.1 Important Variables

Just like DC, the behavior of PhyC is controlled by variables. These variablescan be incorporated in the setup file. For a complete listing of PhyC variablestype:

psyn_shell> printvar

Some of the most commonly used variables are described below:

physopt_pnet_complete_blockage_layer_names

psyn_shell> set physopt_pnet_complete_blockage_layer_names \"metal1 metal2"

The above variable defines the power/ground layers that should be treated asblockages by PhyC. In general this is used for power and ground straps thatare present in the floorplan. The idea is to tell PhyC not to place any cellsunderneath the straps. In the above case, both metal1 and metal2 layer namesare used.

Page 230: Advanced ASIC Chip Synthesis - Bhatnagar

PHYSICAL SYNTHESIS 213

physopt_pnet_partial_blockage_layer_names

psyn_shell> set physopt_pnet_partial_blockage_layer_names \"metal1 metal2"

This variable allows PhyC limited flexibility in placing the cells under thepower/ground straps. The cells will only slide under the straps, if their ownlayers do not collide with layers used by the power/ground straps. In theabove case part of the cells that do not contain metal1 or metal2 layers areallowed to slide under the straps.

10.2 Modes of Operation

Physical synthesis can be performed using the following two modes:

1.2.

RTL to Placed Gates (or RTL2PG)Gates to Placed Gates (or G2PG)

PhyC requires the floorplan information in IEEE PDEF 3.0 format. Thisformat is very similar to the SDF format. It contains the physical coordinatesof cells, placement obstructions (such as pre-placed RAM/ROM's), powerstraps, chip boundary, pad/port locations etc. In other words this file containsall the necessary information required by PhyC to perform optimizedplacement.

10.2.1 RTL 2 Placed Gates

In this mode the input to PhyC is the RTL design, floorplan information inIEEE PDEF 3.0, I/O timing constraints and the physical libraries. The outputis the structured netlist along with the placement data in PDEF 3.0 format.This is shown in Figure 10-1.

Page 231: Advanced ASIC Chip Synthesis - Bhatnagar

214 Chapter 10

Similar to DC, PhyC provides a command that compiles the RTL to producethe optimized netlist along with the placement information. This command iscalled compile_physical with similar options as the original compilecommand of DC. The most commonly used options are listed below:

compile_physical –congestion –scan

The –congestion option can be used to invoke further optimizationalgorithms in order to reduce routing congestion. The –scan option isidentical to the –scan option of the compile command in DC. Here also itonly maps the design to the scan flops but does not stitch them into a scanchain.

The following script shows the RTL2PG physical synthesis flow. PhyCcommands are highlighted in bold.

Page 232: Advanced ASIC Chip Synthesis - Bhatnagar

PHYSICAL SYNTHESIS 215

An example RTL2PG script

# Read the source RTL and floorplan informationread_verilog mydesign.vread_pdef floorplan.pdef

# Define operating conditions and timing constraints.# Note the absence of wire-load models.# All other steps same as before.

current_design mydesignuniquifylinkset_operating_conditions WORSTset_load 1.0[all_outputs]create_clockset_clock_latencyset_clock_transitionset_dont_touch_networkset_input_delayset_output_delay

# Define attributes for scanset_scan_configurationcreate_test_clockset_test_hold 1set_scan_signal test_scan_enableset_scan_signal test_scan_inset_scan_signal test_scan_out

# Synthesize the design using the physical information.# Also synthesize to scan flops. No stitching done.

compile_physical –scancheck_testpreview_scan

# Stitch scan chains based on physical location of flops.insert_scan –physicalcheck_test

Page 233: Advanced ASIC Chip Synthesis - Bhatnagar

216 Chapter 10

# Write out structured netlist along with placement information.write –f verilog –h –o mydesign_placed.svwrite_pdef -v3.0 –o mydesign_placed.pdefexit

10.2.2 Gates to Placed Gates

In this mode the input to PhyC is the structured netlist instead of the RTL.The rest of the input and output files remain identical to the RTL2PG modeof operation.

In this mode the input to PhyC is a structural netlist that has previously beensynthesized using the traditional approach (using the DC compile commandutilizing the wire-load models). Only the placement of these gates is desired.Pictorially this is shown in Figure 10-2.

Page 234: Advanced ASIC Chip Synthesis - Bhatnagar

PHYSICAL SYNTHESIS 217

PhyC provides the following command to perform the G2PG operation:

physopt –congestion –timing_driven_congestion –scan_order

The –congestion option is the same as that described for the RTL2PGmethod. If the design constraints are such that the only the congestion isparamount (area centric designs with easy to meet timing requirements), thisoption should be used. The –timing_driven_congestion should be used forthose designs where both timing and congestion are important. This optionperforms timing driven placement of cells with congestion in mind.

The –scan_order option is used for ordering the scan chain based on thephysical location of flops. This also helps tremendously in reducing thecongestion.

Within the G2PG mode, there are two sub-modes of operation. These sub-modes are called the "two pass method" and the "integrated method". Theseapproaches relate to the way scan chain stitching is handled by physopt.

10.2.2.1 Two pass methodIn this mode of operation, the scan chain is hooked up using the insert_scancommand before physopt is run. The physopt command is subsequentlyused not only to perform placement, but also to order the scan chain based onthe physical location of each flop.

The following script illustrates the two-pass G2PG flow. PhyC and scanchain linking commands are highlighted in bold.

An example of two-pass G2PG script

# Read the synthesized gate level netlist in "db " format.# Assuming "compile-scan" was used to produce the "db" file.# In other words, no scan stitching done, only the design has# been synthesized directly to scan flops.

read_db mydesign.db

# Read the floorplan informationread_pdef floorplan.pdef

Page 235: Advanced ASIC Chip Synthesis - Bhatnagar

218 Chapter 10

# Define operating conditions and timing constraints.# Note the absence of wire-load models. The constraints are# needed by PhyC to perform timing driven placement.

current_design mydesignuniquifylinkset_operating_conditions WORSTset_load 1.0 [all_outputs]create_clockset_clock_latencyset_clock_transitionset_dont_touch_networkset_input_delayset_output_delay

# Define attributes for scanset_scan_configurationcreate_test_clockset_test_hold 1set_scan_signal test_scan_enableset_scan_signal test_scan_inset_scan_signal test_scan_out

# Stitch scan chains. No ordering performed yetinsert_scancheck_test

# Perform timing driven placement along with reduced# congestion. Also order the scan chain based on physical# location of each flop.

physopt –timing_driven_congestion –scan_ordercheck_test

# Write out structured netlist along with placement information.write –f verilog –h –o mydesign_placed.svwrite_pdef -v3.0 –o mydesign_placed.pdefexit

Page 236: Advanced ASIC Chip Synthesis - Bhatnagar

PHYSICAL SYNTHESIS 219

Note: The assumption in the above script is that "compile -scan" alongwith other scan attributes was used to produce the starting"mydesign.db" file. Therefore all scan related attributes mustalready be part of the "db" file. Thus these attributes can beomitted from the above script. They have been provided just forsake of explanation.

10.2.2.2 Integrated methodIn this mode of operation, physopt command is used not only to performplacement but also to stitch and order the scan chain based on physicalproximity of the scan flops. Basically, it unifies the stitching and orderingfunctions. In other words, the use of insert_scan command has beeneliminated thus simplifying the flow.

The following script illustrates the integrated G2PG flow. PhyC commandsare highlighted in bold.

An example of integrated G2PG script

# Read the synthesized gate level netlist in "db" format.# Assuming "compile -scan " was used to produce the "db" file.# In other words, no scan stitching done, only the design has# been synthesized directly to scan flops.

read_db mydesign.db

# Read the floorplan informationread_pdef floorplan.pdef

# Define operating conditions and timing constraints.# Note the absence of wire-load models. The constraints are# needed by PhyC to perform timing driven placement.

current_design mydesignuniquifylinkset_operating_conditions WORSTset_load 1.0 [all_outputs]

Page 237: Advanced ASIC Chip Synthesis - Bhatnagar

220 Chapter 10

create_clockset_clock_latencyset_clock_transitionset_dont_touch_networkset_input_delayset_output_delay

# Define attributes for scanset_scan_configurationcreate_test_clockset_test_hold 1set_scan_signal test_scan_enableset_scan_signal test_scan_inset_scan_signal test_scan_out

# Perform timing driven placement along with reduced# congestion. Also stitch and order the scan chain# based on physical location of each flop.

physopt –timing_driven_congestion –scan_ordercheck_test

# Write out structured netlist along with placement information.write –f verilog –h –o mydesign_placed.svwrite_pdef -v3.0 –o mydesign._placed.pdefexit

Note: The assumption in the above script is that "compile -scan" alongwith other scan attributes was used to produce the starting"mydesign.db" file. Therefore all scan related attributes mustalready be part of the "db" file. These attributes can therefore beomitted from the above script. They have been provided just forsake of explanation.

10.3 Other PhyC Commands

Unlike DC, PhyC does not contain many of its own commands. It must benoted that most of these commands are run “under the hood” by physopt or

Page 238: Advanced ASIC Chip Synthesis - Bhatnagar

PHYSICAL SYNTHESIS 221

“compile_physical”. Users do not need to explicitly run these commands.They are intended for further massaging the layout surface, if needed.Therefore, the commands here are provided to the user with sole intention of“what else is there”. No description is provided. Users are advised to read thePhysical Compiler Users Manual for full description and usage of thesecommands.

Following are some of these commands:

create_placementlegalize_placementcheck_legalityrun_routerset_congestion_optionsreport_congestionset_dont_touch_placmentremove_dont_touch_placement

10.4 Physical Compiler Issues.

Unfortunately as with most EDA tools, PhyC also suffers from severalissues. These issues all relate to PhyC version 2001-SP1. It is expected thatlater versions may have solved some of these problems. Some of the criticalones are listed below:

1.When reading a gate-level netlist using read_verilog, PhyC outputs a lotof assign statements in the final verilog netlist. This only happens whenperforming scan chain ordering, and occurs even after using thefollowing hidden variable:

set physopt_fix_multiple_port_nets true

When reading a precompiled db file and performing the same operations(including using the above variable), no verilog assign statements aregenerated. If a gate-level netlist is compiled into the db format(read_verilog followed by write –f db), PhyC still produces the verilog

Page 239: Advanced ASIC Chip Synthesis - Bhatnagar

222 Chapter 10

2.

3.

assign statements. The only way to prevent this is to use the followingvariable in conjunction with the above variable:

set_fix_multiple_port_nets –all –buffer_constants

This is obviously a nuisance and hopefully will be corrected by Synopsyssoon.

The heavily promoted “integrated physopt flow” by Synopsys does notwork “as advertised”. The idea is to compile the design using one-passscan synthesis and then run physopt with scan order option. Physopt issupposed to perform scan stitching, ordering as well as placement.However, PhyC aborts and complains that the design is not scan readyeven when check_test passes. This problem is solved by using thefollowing attribute just before physopt is run:

set_attribute <design name> is_test_ready true -type boolean

For some reason, it is necessary to explicitly inform PhyC that the designis scan ready. This problem is well documented in the Solv-Net databaseand will be corrected in near future.

Sometimes for under utilized designs, PhyC produces a clumping effectfor the placed cells. In other words, if the design is pad limited and logicarea is very small compared to the overall chip area, the cell placement isnot optimal. They are clumped together in several clusters. The clumpingcauses a localized routing congestion problem. To following commandmay be used before running physopt in order to spread out thecongestion:

set_congestion_options –max_util <number>

Unfortunately, there is no magic number that works well in all cases.This is a hit and trial method. Users are advised to read the man pages ofthis command and make their own informed decision based on theirdesign.

Page 240: Advanced ASIC Chip Synthesis - Bhatnagar

PHYSICAL SYNTHESIS 223

4. Placement of multi-row height cells is not supported in the currentversion. This is an important feature that permits designers to not onlyplace standard cells, but also macros (like RAM, ROM’s, PLL’s etc.)automatically. The traditional flow is to pre–place these macros beforerunning the physopt placement. Synopsys have announced that thiscapability will be added to PhyC in the near future.

10.5 Back-End Flow

Synopsys recently announced the availability of two new add-on options tothe PhyC. These are the Clock Tree Compiler and the Route Compiler. Withthese options enabled, PhyC becomes the only EDA tool to provide acomplete solution starting from RTL synthesis to the final GDSII. Based on acommon timing engine across the entire flow and providing additionalcapabilities such as signal integrity and cross talk analysis, this tool becomesextremely powerful.

At the time of writing this book these new capabilities were not available.Therefore, the rest of the flow based on this technology is not provided.Those users that do not have the clock tree and the route compiler may usetheir own layout tool and proceed from the clock tree insertion phase(described in Chapter 9).

10.6 Chapter Summary

This chapter described the usage and operation of the Physical Compiler.With the introduction of this capability, Synopsys has solved the long-standing problem of discrepancy between the delays estimated by the wire-load models and the final resulting routed design.

Different flows and techniques were described along with helpful scripts toguide the user in performing successful synthesis, placement and scan chainordering.

Few problems associated with PhyC were also discussed. Although in timethese problem will most certainly be corrected. Still it is the intent of this

Page 241: Advanced ASIC Chip Synthesis - Bhatnagar

224 Chapter 10

chapter to make readers aware of these issues in case they are using thisversion of PhyC.

Several new add-ons to Physical Compiler have recently been announced(the Clock Tree Compiler and the Route Compiler) that will enhance thecapability of this tool enormously. These add-ons have been mentioned inthis Chapter, however the usage and operation have not been described dueto the unavailability of these options at the time of writing this book.

Page 242: Advanced ASIC Chip Synthesis - Bhatnagar

11

SDF GENERATIONFor Dynamic Timing Simulation

The standard delay format or SDF contains timing information of all the cellsin the design. It is used to provide timing information for simulating the gate-level netlist.

As mentioned in Chapter 1, verification of the gate-level netlist of a designthrough dynamic simulation is not a recommended approach. Dynamicsimulation method is used to verify the functionality of the design at the RTLlevel only. Verification of a gate-level design using dynamic simulationdepends solely on the coverage provided by the test-bench, therefore, certainpaths in the design that are not sensitized will not get tested. In contrast,formal verification techniques provide superior validation of the design.

The dynamic simulation method for gate-level design verification is stilldominant, and is used extensively by designers. Due to this reason, thischapter provides a brief description on generating the SDF file from DC andPT, which can be used to perform dynamic timing simulation of the design.Furthermore, a few innovative ideas and suggestions are provided tofacilitate designers in performing successful simulation.

Page 243: Advanced ASIC Chip Synthesis - Bhatnagar

226 Chapter 11

Please note that some of the information provided in this chapter is alsodescribed in previous chapters. Since, this is an important topic, it is deemednecessary that a full chapter relating to SDF generation be devoted for thesake of clarity and completeness.

11.1 SDF File

The SDF file contains timing information of each cell in the design. Thebasic timing data comprises of the following:

a)b)c)d)

IOPATH delay.INTERCONNECT delay.SETUP timing check.HOLD timing check.

Following, is an example SDF file that contains the timing information fortwo cells (sequential cell feeding an AND gate), along with the interconnectdelay between them:

(DELAYFILE(SDFVERSION “OVI 2.1”)(DESIGN “top_level”)(DATE “Dec 30 1997”)(VENDOR “std_cell_lib”)(PROGRAM “Synopsys Design Compiler cmos”)(VERSION “1998.08”)(DIVIDER /)(VOLTAGE 2.70:2.70:2.70)(PROCESS “WORST”)(TEMPERATURE 100.00:100.00:100.00)(TIMESCALE 1ns)(CELL

(CELLTYPE “top_level”)(INSTANCE)(DELAY

(ABSOLUTE

Page 244: Advanced ASIC Chip Synthesis - Bhatnagar

SDF GENERATION 227

(INTERCONNECT sub1/U1/Q sub1/U2/A1 (0.02:0.03:0.04)(0.03:0.04:0.05))

))

)(CELL

(CELLTYPE “dff1”)(INSTANCE sub1/U1)(DELAY

(ABSOLUTE(IOPATH CLK Q (0.1:0.2:0.3) (0.1:0.2:0.3)))

)(TIMINGCHECK

(SETUP (posedge D) (posedge CLK) (0.5:0.5:0.5))(SETUP (negedge D) (posedge CLK) (0.6:0.6:0.6))(HOLD (posedge D) (posedge CLK) (0.001:0.001:0.001))(HOLD (negedge D) (posedge CLK) (0.001:0.001:0.001))

))(CELL

(CELLTYPE “and2”)(INSTANCE sub1/U2)(DELAY

(ABSOLUTE(IOPATH A1 Z (0.16:0.24:0.34) (0.12:0.23:0.32))(IOPATH A2 Z (0.11:0.21:0.32) (0.17:0.22:0.34)))

)))

The IOPATH delay specifies the cell delay. Its computation is based uponthe output wire loading and the transition time of the input signal.

The INTERCONNECT delay is a path based, point-to-point delay, whichaccounts for the RC delay between the driving gate and the driven gate. This

Page 245: Advanced ASIC Chip Synthesis - Bhatnagar

228 Chapter 11

wire delay is specified from the output pin of the driving cell to the input pinof the driven cell.

The SETUP and HOLD timing checks contain values that determine therequired setup and hold-time of each sequential cell. These numbers arebased upon the characterized values in the technology library.

11.2 SDF File Generation

The SDF file may be generated for pre-layout or post-layout simulations. Thepost-layout SDF is generated from DC or PT, after back annotating theextracted RC delay values and parasitic capacitances, to DC or PT. The post-layout values thus represent the actual delays associated with the design. Thefollowing commands may be used to generate the SDF file:

DC Command

write_timing –format sdf-v2.1 –output <filename>

PT Command

write_sdf–version [1.0 or 2.1] <filename>

Note: By default, PT generates the 2.1 version of SDF.

11.2.1 Generating Pre-Layout SDF File

The pre-layout numbers contain delay values that are based upon the wire-load models. Also, the pre-layout netlist does not contain the clock tree.Therefore, it is necessary to approximate the post-route clock tree delayswhile generating the pre-layout SDF.

In order to generate the pre-layout SDF, the following commandsapproximate the post-route clock tree values by defining the clock delay,skew, and the transition time.

Page 246: Advanced ASIC Chip Synthesis - Bhatnagar

SDF GENERATION 229

DC & PT Commands

create_clock –period 30 –waveform [list 0 15] [list CLK]

set_clock_latency 2.0 [get_clocks CLK]

set_clock_transition 0.2 [get_clocks CLK]

By setting (fixing) these values as illustrated above, designers may assumethat the resulting SDF file will also contain these values, i.e., the clock delayfrom the source to the endpoint (clock input port of the flops) is fixed at 2.0.However, this is not the case. DC only uses the above commands to performstatic timing analysis, and does not output this information to the SDF file.To avoid this problem, designer should force DC to use the specified delayvalue instead of calculating its own. To ensure the inclusion of 2.0ns as theclock delay, a dc_shell command (explained later in the section) is used to“massage” the resulting SDF.

At the pre-layout level, the clock transition time should also be specified.Failure to fix the transition time of the clock results in false values computedfor the driven flops. Again, the culprit is the missing clock tree at the pre-layout level. The absence of clock tree forces the assumption of high fanoutfor source clock, which in turn causes DC to compute slow transition timesfor the entire clock network. The slow transition times will affect the drivenflops (endpoint flops), resulting in large delay values computed for them.

Page 247: Advanced ASIC Chip Synthesis - Bhatnagar

230 Chapter 11

Consider the diagram shown in Figure 11-1. The dotted lines illustrate theplacement of clock tree buffers, synthesized during layout. At pre-layoutlevel, these buffers do not yet exist. However, there usually is a buffer/cell(shown as shaded cell) at the clock source. This cell may be a big driver,instantiated by the designer with the sole purpose of driving the future clocktree, or it may simply be an input pad. Let us assume that this is an input pad(called CLKPAD) with input pin A and output pin Z. At the pre-layout level,the output pin Z connects directly to all the endpoints.

The easiest way to fix the SDF, so that it reflects the 2.0 ns clock delay fromthe source “CLK” to all the endpoints, is to replace the delay value of theshaded cell (from pin A to pin Z) calculated by DC, with 2.0 ns. This can beachieved by using the following dc_shell command:

dc_shell –t> set_annotated_delay 2.0 –cell \–from CLKPAD/A –to CLKPAD/Z

Note: A similar command also exists for PT.

Page 248: Advanced ASIC Chip Synthesis - Bhatnagar

SDF GENERATION 231

The above command replaces the value calculated by DC, with the onespecified i.e., 2.0 ns. This delay gets reflected in the SDF file in the form ofIOPATH delay, for the cell CLKPAD, from pin A to pin Z.

Fixing the delay value of the clock solves the problem of clock latency.However, what happens to the delay values of the driven flops? Designersmay incorrectly assume that DC uses the specified clock transition for thesole purpose of performing static timing analysis, and may not use thespecified values to calculate delays of the driven flops. This is not so. DCuses the fixed transition value of the clock to calculate delays of driven gates.Not only are the transition values used to perform static timing analysis, butthey are also used while computing the delays of the driven cells. Thus theSDF file contains the delay values that are approximated by the designer atthe pre-layout phase.

11.2.2 Generating Post-Layout SDF File

The post-layout design contains the clock tree information. Therefore, all thesteps that were needed to fix the clock latency, skew, and clock transitiontime, during pre-layout phase, are not required for post-layout SDF filegeneration. Instead, the clock is propagated through the clock network toprovide the real delays and transition times.

As explained in Chapter 9, only the extracted parasitic capacitances and RCdelays should be back annotated to DC or PT, for final SDF generation.

The following commands may be used to back annotate the extracted data tothe design and specify the clock information while generating the post-layoutSDF file for simulation:

DC & PT Commands

read_sdf <interconnect RC’s in SDF format>

source <parasitic capacitances in set_load format>

read_parasitics <DSPF, RSPF or SPEF file for clocks + other critical nets>

Page 249: Advanced ASIC Chip Synthesis - Bhatnagar

232 Chapter 11

create_clock –period 30 –waveform [list 0 15] [list CLK]

set_propagated_clock [get_clocks CLK]

11.2.3 Issues Related to Timing Checks

Sometimes, during simulation, unknowns (X’s) are generated that cause thesimulation to fail. These unknowns are generated due to the violation ofsetup/hold timing checks. Most of the time, these violations are real, howeverthere are instances where a designer may want to ignore some violationsrelated to parts of the design, but still verify others. This is generallyunachievable, due to the simulator’s inability to turnoff the X-generation on aselective basis.

Nearly all simulators provide capabilities to ignore the timing violations,generally for the whole design. They do not have the ability to ignore thetiming violation for an isolated instance of a cell in the design. Due to thisreason, designers are often forced to either modify the simulation library orlive with the failed result.

Modifying the simulation library is also not a viable approach, since turningoff the X-generation can only be performed on a cell. This cell may beinstanced multiple times in the design. Turning off the X-generation for thiscell will prevent the simulator from generating X’s, for all the instances ofthe cell in the design. This is definitely not desired as it may mask the realtiming problems lying elsewhere in the design.

For example, a design may contain multiple clock domains and the datatraverses from one clock domain to the other through synchronization logic.Although, this logic will work perfectly on a manufactured device, it maycause hold-time violations when simulated. This will cause the simulation tofail for the design.

Another example is related to the type of methodology used for synthesis.Some designers prefer to fix the hold-time violations only after layout.Failing to falsify or remove the hold-time values from the pre-layout SDF

Page 250: Advanced ASIC Chip Synthesis - Bhatnagar

SDF GENERATION 233

file may cause the simulator to generate an X (unknown) for the violatingflop. This X may propagate to the rest of the logic causing the wholesimulation to fail.

To prevent these problems, one may need to falsify selectively, the value ofthe setup and hold-time constructs in the SDF file, for simulation to succeed.The SDF file is instance based (rather than cell based), therefore selectivetargeting of the timing checks is easily attained. Instead of manuallyremoving the setup and hold-time constructs from the SDF file, a better wayis to zero out the setup and hold-times in the SDF file, only for the violatingflops, i.e., replace the existing setup and hold-time numbers with zero’s.Back-annotating the zero value for the setup and hold-time to the simulatorprevents it from generating unknowns (if both setup and hold-time is zero,there cannot be any violation), thus making the simulation run smoothly. Thefollowing dc_shell command may be used to perform this:

dc_shell-t> set_annotated_check 0 –setup –hold \–from REG1/CLK \–to REG1/D

Note: A similar command also exists for PT.

11.2.4 False Delay Calculation Problem

This topic is covered in Chapter 4, but is included here for the sake ofcompleteness.

The delay calculation of a cell is based upon the input transition time and theoutput load capacitance of a cell. The input transition time of a cell isevaluated, based upon the transition delay of the driving cell (previous cell).If the driving cell contains more than one timing arc, then the worst transitiontime is used, as input to the driven cell. This causes a major problem whengenerating the SDF file for simulation purposes.

Consider the logic shown in Figure 11-2. The signals, reset and signal_a areinputs to the instance U1. Let us presume that the reset signal is not critical,while the signal_a is the one that we are really interested in. The reset signal

Page 251: Advanced ASIC Chip Synthesis - Bhatnagar

234 Chapter 11

is a slow signal therefore the transition time of this signal is more comparedto signal_a, which has a faster transition time. This causes, two transitiondelays to be computed for cell U1 (2 ns from A to Z, and 0.3 ns from B to Z).When generating SDF, the two values will be written out separately as partof the cell delay, for the cell U1. However, the question now arises, which ofthe two values does DC use to compute the input transition time for cell U2?DC uses the worst (maximum) transition value of the preceding gate (U1) asthe input transition time for the driven gate (U2). Since the transition time ofreset signal is more compared to signal_a, the 2ns value will be used as inputtransition time for U2. This causes a large delay value to be computed forcell U2 (shaded cell).

To avoid this problem, one needs to instruct DC, not to perform the delaycalculation for the timing arc, from pin A to pin Z of cell U1. This stepshould be performed before writing out the SDF. The following dc_shellcommand may be used to perform this:

dc_shell–t> set_disable_timing U1 –from A –to Z

Unfortunately, this problem also exists during static timing analysis. Failureto disable the timing computation of the false path leads to large delay valuescomputed for the driven cell.

Page 252: Advanced ASIC Chip Synthesis - Bhatnagar

SDF GENERATION 235

11.2.5 Putting it Together

The following DC scripts combines all the information provided above andmay be used to generate the pre and post-layout SDF, to be used for timingsimulation of an example tap controller design.

DC script for pre-layout SDF generation

set active_design tap_controller

read_db $active_design.db

current_design $active_designlink

set_wire_load_model LARGEset_wire_load_mode topset_operating_conditions WORST

create_clock–period 33 –waveform [list 0 16.5] tckset_clock_latency 2.0 [get_clocks tck]set_clock_transition 0.2 [get_clocks tck]

set_driving_cell –cell BUFF1 –pin Z [all–inputs]set_drive 0 [list tck trst]set_load 50 [all_outputs]

set_input_delay 20.0 –clock tck –max [all_inputs]set_output_delay 10.0 –clock tck –max [all_outputs]

# Approximate the clock tree delayset_annotated_delay 2.0 –cell –from CLKPAD/A \

–to CLKPAD/Z

# Assuming, only REG1 flop is violating hold-timeset_annotated_check 0 –setup –hold \

–from REG1/CLK –to REG1/D

Page 253: Advanced ASIC Chip Synthesis - Bhatnagar

236 Chapter 11

write_timing –format sdf-v2.1 \–output $active_design.sdf

DC script for post-layout SDF generation

set active_design tap_controller

read_db $active_design.db

current_design $active_designlink

set_operating_conditions BEST

source capacitance.dc # actual parasitic capacitancesread_timing rc_delays.sdf # actual RC delays

create_clock –period 33 –waveform [list 0 16.5] tckset_propagated_clock [get_clocks tck]

set_driving_cell –cell BUFF1 –pin Z [all_inputs]set_drive 0 [list tck trst]

set_load 50 [all_outputs]

set_input_delay 20.0–clock tck–max [all_inputs]set_output_delay 10.0–clock tck–max [all_outputs]

# Assuming, only REG1 flop is violating hold-timeset_annotated_check 0 –setup –hold \

–from REG1/CLK –to REG1/D

write_timing –format sdf-v2.1 \–output $active_design.sdf

Page 254: Advanced ASIC Chip Synthesis - Bhatnagar

SDF GENERATION 237

11.3 Chapter Summary

The SDF file is used exhaustively throughout the ASIC world to performdynamic timing simulations. The chapter briefly summarizes the contents ofthe SDF file that is related to ensuing discussions.

The chapter also discusses procedures for generating the SDF file from DCand PT, both for pre-layout and post-layout simulations. Along withcommand description, various helpful techniques are described to “massage”the SDF, in order for the simulation to succeed. These include fixing theclock latency and clock transition at the pre-layout level, and avoidingunknown propagation from selective logic of the design for successfulsimulation.

The final section gathered all the information and put it together in the formof DC scripts for pre and post-layout SDF generation.

Page 255: Advanced ASIC Chip Synthesis - Bhatnagar

12

PRIMETIME BASICS

PrimeTime (PT) is a sign-off quality static timing analysis tool fromSynopsys. Static timing analysis or STA is without a doubt the mostimportant step in the design flow. It determines whether the design works atthe required speed. PT analyzes the timing delays in the design and flagsviolation that must be corrected.

PT, similar to DC, provides a GUI interface along with the command-lineinterface. The GUI interface contains various windows that help analyze thedesign graphically. Although the GUI interface is a good starting point, mostusers quickly migrate to using the command-line interface. Therefore, theintent of this chapter is to focus solely on the command-line interface of PT.

This chapter introduces to the reader, the basics of PT including a briefsection devoted to Tcl language that is used by PT. Also described in thischapter are selected PT commands that are used to perform successful STA,and also facilitate the designer in debugging the design for possible timingviolations.

Page 256: Advanced ASIC Chip Synthesis - Bhatnagar

240 Chapter 12

12.1 Introduction

PT is a stand-alone tool that is not integrated under the DC suite of tools. It isa separate tool, which works alongside DC. Both PT and DC have consistentcommands, generate similar reports, and support common file formats. Inaddition PT can also generate timing assertions that DC can use for synthesisand optimization. PT’s command-line interface is based on the industry-standard language called Tcl. In contrast to DC’s internal STA engine, PT isfaster, takes up less memory, and has additional features.

12.1.1 Invoking PT

PT may be invoked in the command-line mode using the command pt_shellor in the GUI mode through the command primetime.

Command-line mode:> pt_shell

GUI-mode:> primetime

12.1.2 PrimeTime Environment

Upon invocation PT looks for a file called “.synopsys_pt.setup” and includesit by default. It first searches for this file in the current directory and failingthat, looks for it in the users home directory before using the default setupfile present in the PT installation site. This file contains the necessary setupvariables defining the design environment that are used by PT, exemplifiedbelow:

set search_path [list. /usr/golden/library/std_cells]

set link_path [list {*} ex25_worst.db, ex25_best.db]

Page 257: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 241

The variable search_path defines a list containing directories to look intowhile searching for libraries and designs. It saves the tedium of typing incomplete file paths when referring to libraries and designs.

The variable link_path defines a list of libraries containing cells to be usedfor linking the design. These libraries are searched in the directories specifiedin the search_path. In the above example there are three elements in the listdefined by link_path variable. The ‘*’ indicates designs loaded in thememory, while the other two are names pertaining to the best and the worstcase, standard cell technology libraries.

Another commonly used method of setting up the environment, if we do notwant to use the “.synopsys_pt.setup” file is to use the source command. Thesource command works just like DC’s include command. It includes andruns the file as if it were a script within the current environment. Thiscommand is invoked within pt_shell. For example:

pt_shell> source ex25.env

12.1.3 Automatic Command Conversion

Most of the DC commands are similar to PT commands, the exception beingthat PT being Tcl based uses the Tcl language format. This promotes theneed for DC commands to be converted to the Tcl format before PT canutilize them.

PT offers a conversion script that may be used to convert almost all dc_shellcommands to pt_shell, Tcl based format. The script is called transcriptand is provided by Synopsys as a separate stand-alone utility. This script isexecuted from a UNIX shell as follows:

> transcript <dc_shell script filename> <pt_shell script filename>

Page 258: Advanced ASIC Chip Synthesis - Bhatnagar

242 Chapter 12

12.2 Tcl Basics

Tcl provides most of the basic programming constructs – variables,operators, expressions, control flow, loops and procedures etc. In addition,Tel also supports most of the UNIX commands.

Tcl programs are a list of commands. Commands may be nested within othercommands through a process called command substitution.

Variables are defined and values assigned to them using the set command.For example:

set clock_name clk

set clock_period 20

Values can be numbers or strings – Tcl does not distinguish betweenvariables holding numeric values versus variables holding strings. Itautomatically uses the numeric value in an arithmetic context. In the aboveexamples a variable clock_name has been set to the string clk and a variableclock_period has been set to the numeric value 20. Variables are used bypreceding the variable name with a $. If the $ is not added before the variablename, Tcl treats it as a string.

create_clock $clock_name –period 20 –waveform [0 10]

Arithmetic operations are performed through the expr command. This usefultechnique provides means for global parameterization across the entire script.

expr $clock_period / 5

The above command returns a value 4. Tcl provides all the standardarithmetic operators like *, /, +, –, etc.

Page 259: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 243

12.2.1 Command Substitution

Commands return a value that can be used in other commands. This nestingof commands is accomplished with the square brackets [ and ]. When acommand is enclosed in square brackets, it is evaluated first and its valuesubstituted in place of the command. For example:

set clock_period 20

set inp_del [expr $clock_period / 5]

The above command first evaluates the expression in the square brackets andthen replaces the command with the evaluated value – in this case inp_delinherits the value of 4. Commands may also be nested to any depth. Forexample:

set clock_period 20

set inp_del [expr [expr $clock_period / 5] + 1]

The above example has 2 levels of nesting. The inner command returns avalue of 4 obtained by dividing the value of clock_period by 5. The outerexpr command adds 1 to the result of the inner command. Thus inp_del getsset to a value of 5.

12.2.2 Lists

Lists represent a collection of objects – these objects can be either strings orlists. In the most basic form enclosing a set of items in braces creates a list.

set clk_list {clk1 clk2 clk3}

In the above example, the set command creates a list called clk_list with 3elements, clk1, clk2 and clk3.

Page 260: Advanced ASIC Chip Synthesis - Bhatnagar

244 Chapter 12

Another method of creating lists is through the list command that is typicallyused in command substitution. For example, the following list commandcreates a list just like the previous set command:

list clk1 clk2 clk3

The list command is suitable for use in command substitution because itreturns a value that is a list. The set command example may also be writtenas:

set clk_list [list clk1 clk2 clk3]

Tcl provides a group of commands to manipulate lists – concat, join,lappend, lindex, linsert, list, llength, Irange, Ireplace, Isearch, Isort andsplit. For example, concat concatenates two lists together and returns a newlist.

set new_list [ concat [list clk1 clk2] [list clk3 clk4] ]

In the above example the variable new_list is the result of concatenation ofthe list clk1 & clk2 and the list clk3 & clk4, each of which is formed by thelist command.

Conceptually a simple list is a string containing elements that are separatedby white space. In most cases the following two are equivalent:

{clk1 clk2 clk3}

“clk1 clk2 clk3”

In some cases the second representation is preferred because it allowsvariable substitution while the first representation does not. For example:

set stdlibpath [list “/usr/lib/stdlib25” “usr/lib/padlib25”]

set link_path “/project/bigdeal/lib $stdlibpath”

For more details about the syntax of other list commands refer to anystandard book on Tcl language.

Page 261: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 245

12.2.3 Flow Control and Loops

Like other scripting and programming languages Tcl provides if and switchcommands for flow control. It also provides for and while loops for looping.The if command may be used, along with else or elsif statements tocompletely specify the process flow. The arguments to if, elsif and elsestatements are usually lists, enclosed in braces to prevent any substitution.For example:

if {$port == “clk”} {create_clock –period 10 –waveform [list 0 5] $port

} elsif {$port == “clkdiv2”} {create_generated_clock –divide_by 2 –source clk $port

} else {echo “$port is not a clock port”

}

12.3 PrimeTime Commands

PT uses similar commands as DC, to perform timing analysis and relatedfunctions. Since all relevant dc_shell commands are explained in detail inChapter 6, comprehensive explanation is not provided in this section for allrelated commands.

12.3.1 Design Entry

Unlike DC, which can read RTL source files through HDL Compiler, PTbeing the static analysis engine can only read mapped designs. This performsthe basis of design entry to PT. Among others, input to PT can be a file in db,Verilog, VHDL or EDIF format. The following pt_shell commandsappropriate to each format are used to read the design in PT:

read_db –netlist_only <design name>.db #db format

Page 262: Advanced ASIC Chip Synthesis - Bhatnagar

246 Chapter 12

read_verilog <design name>.sv #verilog format

read_vhdl <design name>.svhd #vhdl format

read_edif <design name>.edf #EDIF format

Since the netlist in db format can also contain constraints and/orenvironmental attributes (maybe saved by the designer), the –netlist_onlyoption may be used for the read_db command to instruct PT to load only thestructural netlist. This prevents PT from reading the constraints and/or otherattributes associated with the design. Only the structural netlist is loaded.

12.3.2 Clock Specification

The concepts behind clock specification remains the same as the onesdescribed for DC in Chapter 6. Subtle syntax differences exist due todifference in formats between the two. However, because clock specificationmay become complex, especially if there are internally generated clocks withclock division, this section will cover the complete PT clock specificationtechniques and syntax.

12.3.2.1 Creating ClocksPrimary clocks are defined as follows:

create_clock –period <value>–waveform {<rising edge> <falling edge>}

<source list>

pt_shell> create_clock –period 20 –waveform {0 10} \[list CLK]

The above example creates a single clock named CLK having a period of20ns, with rising and falling edges at 0ns and 10ns respectively.

Page 263: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 247

12.3.2.2 Clock Latency and Clock TransitionThe following commands are used to specify the clock latency and the clocktransition. These commands are mainly used for pre-layout STA and areexplained in detail in Chapter 13.

set_clock_latency <value> <clock list>

set_clock_transition <value> <clock list>

pt_shell> set_clock_latency 2.5 [get_clocks CLK]

pt_shell> set_clock_transition 0.2 [get_clocks CLK]

The above commands define the clock latency for the CLK port as 2.5ns witha fixed clock transition value of 0.2ns.

12.3.2.3 Propagating the ClockPropagating the clock is usually done after the layout tool inserts the clocktree in the design, and the netlist is brought back to PT for STA. The clock ispropagated through the entire clock tree network in the netlist in order todetermine the clock latency. In other words, the delay across each cell in theclock tree and the interconnect wiring delay between the cells is taken intoaccount.

The following command instructs PT to propagate the clock through theclock network:

set_propagated_clock <clock list>

pt_shell> set_propagated_clock [get_clocks CLK]

12.3.2.4 Specifying Clock SkewClock skew, or clock uncertainty as Synopsys prefers to call it, is thedifference in the arrival times of the clock, at the clock pin of the flops. Insynchronous designs data gets launched by the flop at one clock edge and isreceived by another flop at another clock edge (usually the next clock edge).If the two clock edges (launch and receive) are derived from the same clock

Page 264: Advanced ASIC Chip Synthesis - Bhatnagar

248 Chapter 12

then ideally there should be an exact delay of one clock period between thetwo edges. Clock skew puts a crimp in this happy situation. Because ofvariation in routing delays (or gated clock situation) the receiving clock edgemay arrive early or late. Early arrival could cause setup-time violations andlate arrival may cause hold-time violations. Therefore, it is imperative tospecify the clock skew during the pre-layout phase, in order to producerobust designs.

Clock skew is specified through the following command:

set_clock_uncertainty <uncertainty value>–from <from clock>–to <to clock>–setup–hold<object list>

In the following example, 0.6ns is applied to both the setup and hold-time ofthe clock signal, CLK.

pt_shell> set_clock_uncertainty 0.6 [get_clocks CLK]

The option –setup may be used to apply uncertainty value to setup-timechecks and while –hold option applies the uncertainty value for hold-timechecks. It must be noted that different values for setup and hold cannot beimplemented within a single command. Two separate commands must beused for this purpose. For example:

pt_shell> set_clock_uncertainty 0.5 –hold [get_clocks CLK]

pt_shell> set_clock_uncertainty 1.5 –setup [get_clocks CLK]

Also inter-clock skew can be specified with the –from and –to options,which is useful for designs containing multiple clock domains. For example:

pt_shell> set_clock_uncertainty 0.5 –from [get_clocksCLK1] \–to [get_clocks CLK2]

Page 265: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 249

12.3.2.5 Specifying Generated Clocks

This is an important feature that is absent from DC. Very often a design maycontain internally generated clocks. PT allows the user to define therelationship between the generated clock and the source clock, through thecommand create_generated_clock. This is convenient because pre-layoutscripts can be used for post-layout with minimal changes.

During post-layout timing analysis, clock tree is inserted and the clocklatency is calculated by propagating the clock signal through the clock treebuffers. Users may opt to define the divided clock independent to the sourceclock (by defining the clock on an output pin of the dividing logic sub-block). However, this approach forces designers to manually add the clocktree delay (from the dividing block to the rest of the design) to the clocklatency of the source clock to the dividing logic block.

By setting up a divided clock through the above command, the two clocksare kept in sync both in pre-layout and post-layout phases.

create_generated_clock –name <divided clock name>–source <primary clock name>–divide_by <value>

<pin name>

pt_shell> create_generated_clock –name DIV2CLK \–source CLK –divide_by 2 \

blockA/DFF1X/Q

The above example creates a generated clock on pin Q of the cell DFF1Xbelonging to blockA. The name of the generated clock is DIV2CLK, havinghalf the frequency of the source clock, CLK.

12.3.2.6 Clock Gating ChecksFor low power applications, designers often resort to gating the clock in thedesign. This technique allows designers to enable the clock only whenneeded. The gating logic may produce clipped clock or glitches, if the setupand hold-time requirements are not met (for the gating logic). PT allows

Page 266: Advanced ASIC Chip Synthesis - Bhatnagar

250 Chapter 12

designers to specify the setup/hold requirements for the gating logic, asfollows:

set_clock_gating_check –setup <value>–hold <value><object list>

pt_shell> set_clock_gating_check –setup 0.5 –hold 0.01 CLK

The above example informs PT that the setup-time and hold-timerequirement for all the gates in the clock network of CLK is 0.5ns and 0.01nsrespectively.

Gating checks on an isolated cell can be accomplished by specifying the cellname in the object list. For example:

pt_shell> set_clock_gating_check –setup 0.05 –hold 0.01 \[get_lib_cell stdcell_lib/BUFF4X]

By default, PT performs the gating check with zero value used for setup andhold times – unless the library contains specific values for setup and holdtimes for the cell used to gate the clock. If the gating cell contains thesetup/hold timing checks, then the gating check values may be automaticallyderived from the SDF file.

The clock gating checks are only performed for combinational cells. Also,the gating checks cannot be performed between two clocks.

12.3.3 Timing Analysis Commands

This section describes a selected set of PT commands that are used toperform STA. Only the most commonly used options are listed for thesecommands.

– set_disable_timing: Applications of this command include disablingtiming arc of a cell in order to break the combinational feedback loop, or to

Page 267: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 251

instruct PT to exclude a particular timing arc (thus the path segment) fromanalysis.

set_disable_timing –from <pin name>–to <pin name><cell name>

pt_shell> set_disable_timing –from A1 –to ZN {INVD2}

– report_disable_timing: command is used to display the timing arcs thatwere disabled by the user; or by PT. The report identifies individualdisabled paths, using the following flags:

Flags: u : Timing path disabled by the user.1 : Timing loop broken by PT.c : Timing path disabled during case analysis.

– set_input_transition: is an alternative to the set_driving_cellcommand. It sets a fixed transition time is not dependent on the netloading. This command is specified on input/inout ports of the design.

set_input_transition <value> <port list>

pt_shell> set_input_transition 0.2 [all_inputs]

pt_shell> set_input_transition 0.4 [list in 1 in2]

– set_timing_derate: is used to derate the delay numbers shown in thetiming report. PT provides this powerful capability that is useful in addingextra timing margin to the entire design. The amount of deration iscontrolled by a fixed value, which is specified by the user. The originaldelay numbers are multiplied by this value, before the timing report isgenerated.

set_timing_derate –min <value> –max <value>

pt_shell> set_timing_derate–min 0.2–max 1.2

Page 268: Advanced ASIC Chip Synthesis - Bhatnagar

252 Chapter 12

– Set_case_analysis: command performs case analysis and is one of themost useful feature provided by PT. This command is used to set a fixedlogic value to a port (or pin) while performing STA.

set_case_analysis [ 0 |1 ] <port or pin list>

pt_shell> set_case_analysis 0 scan_mode

Application of this command includes disabling timing paths that are notvalid during a particular mode of operation. For instance, in the aboveexample, the scan_mode port switches the design between the functionalmode (normal operation) and the test mode of operation. The zero valueset on the scan_mode port is propagated to all the cells driven by this port.This results in disablement of certain timing arcs of all cells that arerelated to the scan_mode port. Since testability logic is usually nontiming-critical, disabling the timing arcs of the non timing-critical pathscauses the real timing-critical paths to be identified and analyzed. Theusage of this command is further explained in Chapter 13.

– remove_case_analysis: command is used to remove the case analysisvalues set by the above command.

remove_case_analysis <port or pin list>

pt_shell> remove_case_analysis scan_mode

– report_case_analysis: command is used to display the case analysisvalues set by the user. PT displays a report that identifies the pin/port listalong with the corresponding case analysis values.

pt_shell> report_case_analysis

– report_timing: Similar to DC, this command is used to generate thetiming report of path segments in a design. This command is usedextensively and provides ample flexibility that is helpful in focussingexplicitly on an individual path, or on a collection of paths in a design.

report_timing –from <from list> –to <to list>

Page 269: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 253

–through <through list>–delay_type <delay type>–nets –capacitance –transition_time–max_paths <value> –nworst <value>

The –from and –to options facilitate the user in defining a path foranalysis. Since there may be multiple paths leading from a startpoint to asingle endpoint, the –through option may be used to further isolate therequired path segment for timing analysis.

pt_shell> report_timing –from [all_inputs] \–to [all_registers –data_pins]

pt_shell> report_timing –from in1 \–to blockA/subB/carry_reg1/D \–through blockA/mux1/A1

The –delay_type option is used to specify the type of delay to bereported at an endpoint. Accepted values are max, min, min_max,max_rise, max_fall, min_rise, and min_fall. By default PT uses themax type, which reports that the maximum delay between two points. Themin type option is used to display the minimum delay between two points.The max type is used for analyzing the design for setup-time while themin type is used to perform hold-time analysis. The other types are notfrequently used and users are advised to refer to the PT User Guide forfull explanation regarding their usage.

pt_shell> report_timing –from [all_registers-clock_pins] \–to [all_registers-data_pins] \–delay_type min

The –nets, –capacitance and –transition_time options are one of themost useful and frequently used options of the report_timing command.These options help the designer to debug a particular path, in order totrack the cause of a possible violation. The –nets option displays thefanout of each cell in the path report, while the –capacitance and the –transition_time options reports the lumped capacitance on the net and thetransition time (slew rate) for each driver or load pin, respectively. Failure

Page 270: Advanced ASIC Chip Synthesis - Bhatnagar

254 Chapter 12

to include these options results in a timing report that does not include theinformation mentioned above.

pt_shell> report_timing –from in1 \–to blockA/subB/carry_reg1/D \–nets –capacitance –transition_time

The –nworst option specifies the number of paths to be reported for eachendpoint, while the –max_paths option defines the number of paths to bereported per path group for different endpoints. The default value of boththese options is 1.

pt_shell> report_timing –from [all_inputs] \–to [all_registers –data_pins] \–nworst 1000 –max_paths 500

– report_constraint: Similar to DC, this command in PT checks for theDRC’s as defined by the designer or the technology library. Additionally,this command is also useful for determining the “overall health of thedesign with regards to the setup and hold-time violations. The syntax ofthis command along with the most commonly used options is:

report_constraint –all_violators –max_delay–max_transition –min_transition–max_capacitance –min_capacitance–max_fanout –min_fanout–max_delay –min_delay–clock_gating_setup –clock_gating_hold

The –all_violators option displays all constraint violators. Generally, thisoption is used to determine at a glance, the overall condition of the design.The report summarizes all the violators starting from the greatest, to theleast violator for a particular constraint.

pt_shell> report_constraint –all_violators

Selective reports may be obtained by using the –max_transition,min_transition, –max_capacitance, –min_capacitance,

Page 271: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 255

max_fanout, –min_fanout, –max_delay, and –min_delay options. The–max_delay and –min_delay options report a summary of all setup andhold-time violations, while others report the DRC violations. The –clock_gating_setup and the –clock_gating_hold commands are used todisplay the setup/hold-time reports for the cell used for gating the clock.In addition, there are other options available for this command that maybe useful to the designer. Full details of these options may be found in thePT User Guide.

pt_shell> report_constraint –max_transition

pt_shell> report_constraint –min_capacitance

pt_shell> report_constraint –max_fanout

pt_shell> report_constraint –max_delay –min_delay

pt_shell> report_constraint –clock_gating_setup \–clock_gating_hold

Initially use the report_constraint command to ascertain the amount,and the number of violations. The report produced provides a generalestimate of the overall health of the design. Depending upon theseverity of violations, a possible re-synthesis of the design may needto be performed. To further isolate the cause of the violation, thereport_timing command should be used to target the violating path,in order to display a full timing report.

– report_bottleneck: This command is used to identify the leaf cells in thedesign that contribute to multiple violations. For instance, several violatingpath segments of a design may share a common leaf cell. Altering the sizeof this leaf cell (sizing up or down) may improve the timing (thus removeviolation) of all the violating path segments. The syntax of this commandalong with the most commonly used options is:

report_bottleneck –from <from list> –to <to list>–through <through list> –max_cells <value>–max_paths <value> –nworst_paths <value>

Page 272: Advanced ASIC Chip Synthesis - Bhatnagar

256 Chapter 12

The –from and –to options facilitate the user in defining a path forbottleneck analysis. Since there may be multiple paths leading from astartpoint to a single endpoint, the –through option may be used to furtherisolate the required path segment for bottleneck analysis.

pt_shell> report_bottleneck –from in1 \–to blockA/subB/carry_reg 1 /D \–through blockA/mux1/A1

As the name suggests, the –max_cells option specifies the number of leafcells to be reported. The default value is 20.

The –nworst_paths option specifies the number of paths to be reportedfor each endpoint, while the –max_paths option defines the number ofpaths to be reported per path group for different endpoints. The defaultvalue of both these options is 100.

pt_shell> report_bottleneck –from in1 \–to blockA/subB/carry_reg 1/D \–through blockA/mux1/A1 \–max_cells 50 \–nworst_paths 500 –max_paths 200

12.3.4 Other Miscellaneous Commands

– write_sdf: command generates the SDF file that contains delays andtiming checks for each instance in the design. PT uses the wire-loadmodels to estimate the delays of cells during the pre-layout phase. Forpost-layout, PT uses the actual annotated delays (from the physical layout)while generating the SDF file. The syntax of this command along with themost commonly used options is:

write_sdf –version 1.0 | 2.1–no_net_delays–no_timing_checks<sdf output filename>

Page 273: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMETIME BASICS 257

Unless explicitly specified, by default PT generates the SDF file in SDFversion 2.1 format.

The –no_net_delays option specifies that the interconnect delays(INTERCONNECT field in the SDF file) are not to be written outseparately in the SDF file. In this case, they are included as part of theIOPATH delay of each cell. This option is mainly used during the pre-layout phase because of the fact that the interconnect delays are basedupon the wire-load models. However, the interconnect delays after layoutare real and are based on the routed design. Therefore, in general thisoption should be avoided while generating the post-layout SDF file.

pt_shell> write_sdf –no_net_delays top_prelayout.sdf

pt_shell> write_sdf top_postlayout.sdf

Specification of the –no_timing_checks option forces PT to omit thetiming-checks section (TIMINGCHECK field) from the SDF file. Asdescribed in Chapter 11, the timing-checks section contains thesetup/hold/width timing checks. This option is useful for generating theSDF file that may be used to validate, only the functionality of the designthrough dynamic simulation, without bothering to check forsetup/hold/width timing violations. Once the design passes functionalvalidation, full SDF (no –no_timing_checks option) may be generated.

pt_shell> write_sdf –no_timing_checks top_prelayout.sdf

– write_sdf_constraints: This command is similar to thewrite_constraints command in DC and performs the same function. It isused to generate the path timing constraints in SDF format, which is usedby the layout tool to perform timing driven layout. The syntax of thiscommand along with the most commonly used options is:

write_sdf_constraints –version <1.0 | 2.1 >–from <from list> –to <to list>–through <through list>–cover_design

Page 274: Advanced ASIC Chip Synthesis - Bhatnagar

258 Chapter 12

–slack_lesser_than <value>–max_paths <value> –nworst <value><constraint filename>

Unless explicitly specified, by default PT generates the constraint file inSDF version 2.1 format. The –from, –to and –through options facilitatethe user in specifying a particular path to be written to the constraint file.

The –nworst option specifies the number of paths to be written to theconstraint file for each endpoint, while the –max_paths option definesthe number of paths to be considered for each constraint group. Thedefault value of both these options is 1. The default settings of theseoptions usually suffice for most designs.

pt_shell> write_sdf_constraints –from in1 \–to blockA/subB/carry_reg1/D\–through blockA/mux1/A1 \tdl.sdf

The –cover_design option is used to generate just enough unique pathtiming constraints to cover the worst path for each path segment in thedesign. When specified, all other options such as, –nworst, –to, –fromand –through are ignored. Although this option is recommended bySynopsys, it should be used judiciously as it may produce long run-times,especially for large designs.

pt_shell> write_sdf_constraints –cover_design tdl.sdf

An alternative is to use the –slack_lesser_than option that specifies thatany path that has a slack value greater than the one specified is to beignored. This means that a negative slack value for a path segment isconsidered to be most critical and has the highest priority. Thus all criticalpaths may be universally selected by specifying a low value for thisoption, hence will be written out to the constraint file. All high slackvalues (less critical paths) will be ignored.

pt_shell> write_sdf_constraints –slack_lesser_than 1.5 tdl.sdf

Page 275: Advanced ASIC Chip Synthesis - Bhatnagar

PRIMET1ME BASICS 259

– swap_cell: This command may be used to replace an existing cell in thedesign with another, having the same pinout.

swap_cell <cell list to be replaced> <new design>

For example, if a path is failing due to hold-time violation and in order tofix the timing violation, you want to see the effect on the reported slack,by sizing down a particular leaf cell in the path, without changing thenetlist. In this case the swap_cell command may be used at the commandline to replace the existing cell with another, containing the same pinout.

pt_shell> swap_cell {U1} [get_lib_cell stdcell_lib/AND2X2]

In the above example, the instance Ul (say a 2-input AND gate with 8Xdrive strength) in a design is replaced by the AND2X2 gate (2X drivestrength) from the “stdcel_lib” technology library.

12.4 Chapter Summary

Static timing analysis is one of the most critical steps for the entire ASICchip synthesis flow. This chapter provides an introduction to PrimeTime thatincluded PrimeTime invocation and its environment settings.

PrimeTime is a stand-alone static timing analysis tool, which is based on theuniversally adopted EDA tool language, Tcl. A brief section is included onthe Tcl language in context of PrimeTime, to facilitate the designer in writingPrimeTime scripts and building upon them to produce complex scripts.

The last section covers all relevant PrimeTime commands that may be usedto perform static timing analysis, design debugging and writing delayinformation in SDF format. In addition, this section also covers topics ondesign entry and clock specification, both for pre-layout and post-layout.

Page 276: Advanced ASIC Chip Synthesis - Bhatnagar

13

STATIC TIMING ANALYSISUsing PrimeTime

The key to working silicon usually lies in successful completion of statictiming analysis performed on a particular design. PT is a stand-alone tool bySynopsys that is used to perform static timing analysis. It not only checks thedesign for required constraints that are governed by the design specifications,but also performs comprehensive analysis of the design. This capabilitymakes STA one of the most important steps in the entire design flow and isused by many designers as a sign-off criterion to the ASIC vendor.

This chapter illustrates the part of the design flow where PT is utilized. Itcovers both the pre-layout and the post-layout phases of the ASIC designflow process.

STA is closely integrated with the overall synthesis flow, therefore parts ofthis chapter may contain some repetition from elsewhere in this book.

13.1 Why Static Timing Analysis?

Traditional methods of analyzing gate-level designs using dynamicsimulation are posing a bottleneck for large complex designs. Today, the

Page 277: Advanced ASIC Chip Synthesis - Bhatnagar

262 Chapter 13

trend is towards incorporating system-on-a-chip (SoC), which may result inmillions of gates per ASIC. Verifying such a design through dynamicsimulation poses a nightmare to designers and may prove to be impossibledue to long run-times (usually days and sometimes weeks). Furthermore,dynamic simulation relies on the quality and coverage of the test-bench usedfor verification. Only parts of the logic that are sensitized are tested while theremaining parts of the design remain untested. To alleviate this problem,designers now resort to other means of verification such as STA to verify thetiming; and formal verification technique to verify the functionality of thegate-level netlist against the source RTL. However, comprehensive sets oftest-benches are still needed to verify the functionality of the source RTL.Thus, dynamic simulation is needed to solely verify the functionality of thedesign at the RTL level. This results in considerable reduction in run-time.

The STA approach is infinitely fast compared to dynamic simulation andverifies all parts of the gate-level design for timing. Due to the similar natureof the synthesis and the STA engine, the static timing analysis is well suitedfor verifying synthesized designs.

13.1.1 What to Analyze?

In general, four types of analysis is performed on the design, as follows:

a)b)c)d)

From primary inputs to all flops in the design.From flop to flop.From flop to primary output of the design.From primary inputs to primary outputs of the design.

All four types of analysis can be accomplished by using the followingcommands:

pt_shell> report_timing –from [all_inputs] \–to [all_registers –data_pins]

pt_shell> report_timing –from [all_registers –clock_pins] \–to [all_registers –data_pins]

Page 278: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 263

pt_shell> report_timing –from [all_registers -clock_pins] \–to [all_outputs]

pt_shell> report_timing –from [all_inputs] \–to [all_outputs]

Although, using the above commands is a cleaner method of generatingreports for piping it to individual files for analysis, however, PT takes longertime to perform each operation. PT takes less time to generate the sameresults, if the following commands are used:

pt_shell> report_timing –to [all_registers –data_pins]pt_shell> report-timing –to [all_outputs]

13.2 Timing Exceptions

In most designs there may be paths that exhibit timing exceptions. Forinstance, some parts of the logic may have been designed to function asmulticycle paths, while others may simply be false paths. Therefore, beforeanalyzing the design, PT must be made aware of the special behaviorexhibited by these paths. PT may report timing violation for multicycle pathsif they are not specified as such. Also, path segments in the design that arenot factual, must be identified and specified as false paths, in order to preventPT from producing the timing reports for these paths.

13.2.1 Multicycle Paths

PT by default, treats all paths in the design as single-cycle and performs theSTA accordingly, i.e., data is launched from the driving flop using the firstedge of the clock, and is captured by the receiving flop using the second edgeof the clock. This means that the data must be received by the receiving flopwithin one clock cycle (single clock period). In the multicycle mode, the datamay take more than one clock cycle to reach its destination. The amount oftime taken by the data to reach its destination is governed by the multipliervalue used in the following command:

set_multicycle_path <multiplier value>–from <from list> –to <to list>

Page 279: Advanced ASIC Chip Synthesis - Bhatnagar

264 Chapter 13

Figure 13-1, illustrates the comparison between the single-cycle setup/hold-time relationship and the multicycle setup/hold-time relationship. In the

Page 280: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 265

multicycle definition, a multiplier value of 2 is used to inform PT that thedata latching occurs at regB after an additional clock pulse. The followingcommand was used:

pt_shell> set_multicycle_path 2 –from regA –to regB

In case of generated clocks, PT does not automatically determine therelationship between the primary clock and the derived clock, even if thecreate_generated_clock command is used. The single-cycle determinationis independent of whether one clock is generated or not. It is based on thesmallest interval between the open edge of the first clock to the closing edgeof the second clock (in this case generated clock).

For separate clocks with different frequencies, the set_multicycle_pathcommand may be used to define the relationship between these clocks. Bydefault, PT uses the most restrictive setup-time and hold-time relationshipbetween these clocks. These may be overridden by using theset_multicycle_path command that defines the exact relationship betweenthese clocks.

Figure 13-2 illustrates an example, where a relationship exists between twoseparate clocks. During the single-cycle timing (default behavior), the setupand hold-time relationship occurs as shown. However, to specify amulticycle path between regA and regB, the following command is used:

pt_shell> set_multicycle_path 2–setup \–from regA/CP –to regB/D

The above example uses the multiplier value of 2 to define the setup-timerelationship between the two clocks. The –setup option is used to define thesetup-time relationship. However, this option also effects the hold-timerelationship. PT uses a set of rules (explained in detail in PT User Guide) todetermine the most restrictive relationship for the hold-time, between the twoclocks. Therefore, PT may assume an incorrect hold-time relationshipbetween the two clocks (shown as dotted line in Figure 13-2). To avoid thissituation, the hold-time relationship between the two clocks should also bedefined. Specification of the hold-time relationship through the

Page 281: Advanced ASIC Chip Synthesis - Bhatnagar

266 Chapter 13

set_multicycle_path command is very confusing, therefore not arecommended approach. Designers are advised to use the followingcommand to specify the hold-time relationship between the two flops:

pt_shell> set_min_delay 0 –from regA/CP –to regB/D

The zero value moves the hold-time relationship from the default value(dotted line in Figure 13-2) to the desired edge (bold line in Figure 13-2).

Page 282: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 267

13.2.2 False Paths

Some designs may contain false timing paths. A false path is identified as atiming path that does not propagate a signal. False paths are created throughthe following pt_shell command:

Page 283: Advanced ASIC Chip Synthesis - Bhatnagar

268 Chapter 13

set_false_path –from <from list> –to <to list>–through <through list>

It must be noted that the above command does not disable the timing arc ofany cell, it merely removes the constraints of the identified path. Therefore, ifthe timing analysis is performed on the false path, an unconstrained timingreport is generated.

By default, PT performs STA on all the paths. This results in the generationof timing reports for all the path segments (including the false paths in thedesign). If the false path segment is failing timing by a large amount then thereport may mask the violations of the real timing paths. This of coursedepends upon the options used for the report_timing command.

Lets presume that there are multiple false paths in the design and they are allfailing by a large amount during hold-time STA. However, the real timingpaths are failing by a small margin. The false paths have not been identifiedbecause the user thinks that a large value of –nworst and –max_pathsoptions will cover all the paths in the design (including the real timing paths),therefore identification of false paths is unnecessary. The user uses thefollowing command to analyze the design:

pt_shell> report_timing –from [all_inputs] \–to [all_registers –data_pins] \–nworst 10000 –max_paths 1000 \–delay–type min

The above method is certainly a viable approach and may not overly impactthe run-time. However, a large value for the –nworst and –max_pathsoptions (used in the above example) causes PT to generate/display multipletiming reports, covering all the paths in the design, most of which are falsepaths. Only a selected few timing reports relate to the real timing violations.By using this approach, it becomes tedious to distinguish between the realtiming path and the false timing paths. In addition, due to the large amount oftiming reports generated, it is easy to mistakenly overlook a real timing paththat is violating the timing constraints. To avoid this situation, false pathidentification is recommended before performing STA.

Page 284: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 269

In addition designers may use the –through option to further isolate the falsepath. It must be noted that the –through option significantly impacts the run-time, therefore should be used judiciously and the usage minimized. A betteralternative is to disable the timing arc of the cell in the –through list, usingthe set_disable_timing command explained later in this chapter.

13.2.2.1 Helpful Hints for Setting False PathsTiming exceptions impact the run-time. Setting multiple false paths in adesign causes PT to slow down even further. Designers inadvertently specifythe false paths with no regards to the proper usage, thereby impacting therun-time. The following suggestions are provided to help the designer inproperly defining the false paths:

a)

b)

c)

Avoid using wildcard characters when defining false path. Failing to doso may result in PT generating a large number of false paths. Forexample:

pt_shell> set_false_path –from ififo_reg*/CP \–to ofifo_reg*/D

In the above case, if the ififo_reg and ofifo_reg are each part of a 16-bitregister bank, PT will generate a large number of unnecessary false paths.Disabling the timing arc of a common cell that is shared by the abovepaths is a better approach. The timing arc is disabled using theset_disable_timing command, explained in the next section.

Avoid using –through option for multiple false paths. Try finding acommon cell that is shared by a group of identified false paths. Disablethe timing arc of this cell through the set_disable_timing command.

Do not define false paths for registers belonging to separate asynchronousclock domains. For instance, if there are two asynchronous clocks (say,CLK1 and CLK2) then the following command should be avoided:

pt_shell> set_false_path –from [all_registers-clock CLK1] \–to [all_registers –clock CLK2]

Page 285: Advanced ASIC Chip Synthesis - Bhatnagar

270 Chapter 13

The above command forces PT to enumerate every register in the design,thereby causing a big impact on the run-time. A superior alternative is toset the false paths on the clocks itself, rather than the registers. Doingthis prevents PT from enumerating all the registers in the design,therefore little or no impact on the run-time is observed. This is apreferred and efficient method of defining the asynchronous behavior oftwo clocks in PT. For example:

pt_shell> set_false_path –from [get_clocksCLK1] \–to [get_clocks CLK2]

pt_shell> set_false_path –from [get_clocks CLK2] \–to [get_clocks CLK1]

13.3 Disabling Timing Arcs

PT automatically disables timing paths that cause timing loops, in order tocomplete the STA on a design. However, designers sometimes find itnecessary to disable other timing paths for various reasons, most prevalentbeing the need for PT to choose the correct timing path at all times. Thetiming arcs may be disabled by individually disabling the timing arc of a cell,or by performing case analysis on an entire design.

13.3.1 Disabling Timing Arcs Individually

During STA, sometimes it becomes necessary to disable the timing arc of aparticular cell, in order to prevent PT from using that arc while calculatingthe path delay. The need to disable the timing arc arises from the fact that, inorder to calculate the delay of a particular cell, PT uses the timing arc thatproduces the largest delay. This sometimes is undesired and produces falsedelay values. This is explained in detail in Chapter 4.

Page 286: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 271

Another reason for disabling the timing arc of an individual cell is to preventPT from choosing the wrong timing path. Figure 13-3 illustrates a case wherethe control input (bist_mode) is used to select between signals, bist_sig andfunc_sig, which are inputs to the multiplexer, MUXD1. The bist_sig signal isselected to propagate when the bist_mode signal is low, while the signalfunc_sig is allowed to pass when the bist_mode signal is high. During normalmode (functional mode), the signal bist_sig is blocked, while the signalfunc_sig is allowed to propagate. However, during test-mode (for e.g., fortesting BIST logic), the bist_sig signal is selected to pass through, whileblocking the func_sig signal. The application of this mux is described indetail in Chapter 8 (Figure 8-5), where it is used to bypass the input signalsto the RAM, so that the logic previously shadowed by the RAM, thus un-scannable, can be made scannable.

Three timing arcs exist for this cell – from A1 to Z, A2 to Z, and from S to Z.Only the first two arcs are shown in the above figure for the sake of clarity.While performing STA to check the timing in functional mode, unless theuser isolates the path using the –through option of the report_timingcommand, PT may choose the wrong path (going through A2 to Z), therebygenerating a false path delay timing report. Therefore it is prudent that thetiming arc be disabled from A2 to Z of the cell MUXD1 (instance name U1)

Page 287: Advanced ASIC Chip Synthesis - Bhatnagar

272 Chapter 13

during functional mode STA. This is performed using the following pt_shellcommand:

pt_shell> set_disable_timing –from A2 –to Z {U1}

13.3.2 Case Analysis

An alternate solution to the above scenario is to perform case analysis on thedesign. By setting a logic value to the bist_mode signal, all timing arcsrelated to the bist_mode signal are disabled/enabled. In the above case, usingthe following command disables the timing arc from A2 to Z:

pt_shell> set_case_analysis 1 bist_mode

The logic 1 value for the bist_mode signal forces PT to disable the timing arcfrom A2 to Z and enables the signal func_sig to propagate. By changing thisvalue to 0, the arc from A1 to Z is disabled and the bist_sig signal is allowedto propagate.

Although, both the set_disable_timing and set_case_analysis commandsperform the same function of disabling the timing arcs, the case analysisapproach is superior, for designs containing many such situations. Forinstance, a single command is used to analyze the entire design in either thenormal mode or the test mode. However, the set_disable_timing commandis useful for disabling the timing arc of an individual cell, when performingSTA.

13.4 Environment and Constraints

Apart from slight syntax differences, the environment and constraintssettings for PT are same as that used for DC. The following commandsexemplify these settings:

pt_shell> set_wire_load_model –name <wire-load model name>

pt_shell> set_wire_load_mode < top | enclosed | segmented>

Page 288: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 273

pt_shell> set_operating_conditions <operating conditions name>

pt_shell> set_load 50 [all_outputs]

pt_shell> set_input_delay 10.0 –clock <clock name> [all_inputs]

pt_shell> set_output_delay 10.0 -clock <clock name> [all_outputs]

Although, PT provides a multitude of options for the above commands, mostdesigners only use a limited set of options, as shown above. Users areadvised to refer to PT User Guide for full details regarding additional optionsavailable for each of the above commands.

Since the behavior and function of these commands are same as thecommands used for DC, no explanation is given here. The DC commandsthat are related to each of the above command are explained in detail inChapter 6.

13.4.1 Operating Conditions – A Dilemma

In general, the design is analyzed for setup-time violations utilizing theworst-case operating conditions, while the best-case operating condition isused to analyze the design for hold-time violations.

The reason for using the worst-case operating conditions to perform setup-time analysis is that the delay values of each cell in the library depict thedelays (usually large) of a device operating under the worst-case conditions(maximum temperature, low voltage and other worst-case processparameters). The large delay values cause the data-flow to slow down,which may result in a setup-time failure for a particular flop.

An opposite effect occurs for the data-flow when the design uses the best-case operating conditions for hold-time STA. In this case, the delay values(small) of each cell in the technology library depict the best-case operatingconditions (minimum temperature, high voltage and other best-case processparameters). Therefore, the data-flow now encounters less delay for it to

Page 289: Advanced ASIC Chip Synthesis - Bhatnagar

274 Chapter 13

reach its destination, i.e., the data arrives faster than before, which may causehold-time violations at the input of–the register.

By analyzing the design at both corners of the operating conditions, a time-window is created that states – if the device operates within the range definedby both operating conditions, the device will operate successfully.

13.5 Pre-Layout

After successful synthesis, the netlist obtained must be statically analyzed tocheck for timing violations. The timing violations may consist of either setupand/or hold-time violations.

The design was synthesized with emphasis on maximizing the setup-time,therefore you may encounter very few setup-time violations, if any.However, the hold-time violations will generally occur at this stage. This isdue to the data arriving too fast at the input of sequential cells with respect tothe clock.

If the design is failing setup-time requirements, then you have no otheroption but to re-synthesize the design, targeting the violating path for furtheroptimization. This may involve grouping the violating paths or over-constraining the entire sub-block, which had violations. However, if thedesign is failing hold-time requirements, you may either fix these violationsat the pre-layout level, or may postpone this step until after layout. Manydesigners prefer the latter approach for minor hold-time violations (also usedhere), since the pre-layout synthesis and timing analysis uses the statisticalwire-load models and fixing the hold-time violations at the pre-layout levelmay result in setup-time violations for the same path, after layout. However,if the wire-load models truly reflect the post-routed delays, then it is prudentto fix the hold-time violations at this stage. In any case, it must be noted thatgross hold-time violations should be fixed at the pre-layout level, in order tominimize the number of hold-time fixes, which may result after the layout.

Page 290: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 275

13.5.1 Pre-Layout Clock Specification

In the pre-layout phase, the clock tree information is absent from the netlist.Therefore, it is necessary to estimate the post-route clock-tree delays up-front, during the pre-layout phase in order to perform adequate STA. Inaddition, the estimated clock transition should also be defined in order toprevent PT from calculating false delays (usually large) for the driven gates.The cause of large delays is usually attributed to the high fanout normallyassociated with the clock networks. The large fanout leads to slow inputtransition times computed for the clock driving the endpoint gates, which inturn results in PT computing unusually large delay values for the endpointgates. To prevent this situation, it is recommended that a fixed clocktransition value be specified at the source.

The following commands may be used to define the clock, during the pre-layout phase of the design.

pt_shell> create_clock –period 20 –waveform [list 0 10] [list CLK]

pt_shell> set_clock_latency 2.5 [get_clocks CLK]

pt_shell> set_clock_transition 0.2 [get_clocks CLK]

pt_shell> set_clock_uncertainty 1.2 –setup [get_clocks CLK]

pt_shell> set_clock_uncertainty 0.5 –hold [get_clocks CLK]

The above commands specify the port CLK as type clock having a period of20ns, the clock latency as 2.5ns, and a fixed clock transition value of 0.2ns.The clock latency value of 2.5ns signifies that the clock delay from the inputport CLK to all the endpoints is fixed at 2.5ns. In addition, the 0.2ns value ofthe clock transition forces PT to use the 0.2ns value, instead of calculating itsown. The clock skew is approximated with 1.2ns specified for the setup-time,and 0.5ns for the hold-time. Using this approach during pre-layout yields arealistic approximation to the post-layout clock network results.

Page 291: Advanced ASIC Chip Synthesis - Bhatnagar

276 Chapter 13

13.5.2 Timing Analysis

The following script gathers all the information provided above and may beused to perform the setup-time STA on a design.

PT script for pre-layout setup-time STA

# Define the design and read the netlist onlyset active_design <design name>

read_db –netlist_only $active_design.db

# or use the following command to read the Verilog netlist.# read_verilog $active_design.v

curren_design $active_design

set_wire_load_model <wire-load model name>set_wire_load_mode < top | enclosed | segmented>

set_operating_conditions <worst-case operating conditions>

# Assuming the 50pf load requirement for all outputsset_load 50.0 [all_outputs]

# Assuming the clock name is CLK with a period of 30ns.# The latency and transition are frozen to approximate the# post-routed values.

create_clock –period 30 –waveform [0 15] CLKset_clock_latency 3.0 [get_clocks CLK]set_clock_transition 0.2 [get_clocks CLK]set_clock_uncertainty 1.5 –setup [get_clocks CLK]

# The input and output delay constraint values are assumed# to be derived from the design specifications.

set_input_delay 15.0-clock CLK [all_inputs]set_output_delay 10.0-clock CLK [all_outputs]

Page 292: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 277

# Assuming a Tcl variable TESTMODE has been defined.# This variable is used to switch between the normal-mode and# the test-mode for static timing analysis. Case analysis for# normal-mode is enabled when TESTMODE = 1, while# case analysis for test-mode is enabled when TESTMODE = 0.# The bist_mode signal is used from the example illustrated in# Figure 13-3.

set TESTMODE [getenv TESTMODE]

if {$TESTMODE== 1} {set_case_analysis 1 [get_port bist_mode]

} else {set_case_analysis 0 [get_port bist_mode]

}

# The following command determines the overall health# of the design.

report_constraint –all_violators

# Extensive analysis is performed using the following commands.report_timing –to [all_registers –data_pins]report_timing –to [all_outputs]

Also, specification of the startpoint and the endpoint for the –from and the –to options of the report_timing command may be used to target selectivepaths. In addition, further isolation of the selected path may be achieved byusing the –through option.

By default, PT performs the maximum delay analysis, therefore specificationof the max value for the –delay_type option of the report_timing commandis not needed. However, in order to display all timing paths of the design, the–nworst and/or –max_paths options may be utilized.

As mentioned in the previous chapter, the report_constraint command isused to determine the overall health of the design. This command should beinitially used to check for DRC violations (max_transition,

Page 293: Advanced ASIC Chip Synthesis - Bhatnagar

278 Chapter 13

max_capacitance, and max_fanout etc.). In addition, this command mayalso be used to generate a broad spectrum of setup/hold-time timing reportsfor the entire design. Note that the timing report produced by thereport_constraint command does not include a full path timing report. Itonly produces a summary report for all violating paths per endpoint(assuming that the –all_violators option is used).

The report_timing command is used to analyze the design in more detail.This command produces a timing report that includes the full path from thestartpoint to the endpoint. This command is useful for analyzing the failingpath segments of the design. For instance, it is possible to narrow down thecause of the failure, by utilizing the –capacitance and –net options of thiscommand.

13.6 Post-Layout

The post-layout steps involve analyzing the design for timing with actualdelays back annotated. These delays are obtained by extracting the layoutdatabase. The analysis is performed on the post-routed netlist that containsthe clock tree information. Various methods exist for porting the clock tree toDC and PT, and have been explained in detail in Chapter 9. Let us assumethat the modified netlist exists in db format.

At this stage, a comprehensive STA should be performed on the design. Thisinvolves analyzing the design for both the setup and hold-time requirements.In general, the design will pass timing with ample setup-time, but may failhold-time requirements. In order to fix the hold-time violations, severalmethods may be employed. These are explained in Chapter 9. Afterincorporating the hold-time fixes, the design must be analyzed again to verifythe timing of the fixes.

13.6.1 What to Back Annotate?

One of the most frequent questions asked by designers is –What should Iback annotate to PT, and in what format?

Page 294: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 279

Chapter 9 discusses various types of layout database extraction andassociated formats. Pros and cons of each format are discussed at length. It isrecommended that the following types of information be generated from thelayout tool for back annotation to PT in order to perform STA:

d)

e)

Net RC delays in SDF format.

Capacitive net loading values in set_load format.

Parasitic information for clock and other critical nets in DSPF, RSPF orSPEF file formats.

The following PT commands are used to back annotate the aboveinformation:

read_sdf: As the name suggests, this command is used read the SDFfile. For example:

pt_shell> read_sdf rc_delays.sdf

source: PT uses this command to read external files in Tel format.Therefore, this command may be used to back annotate the netcapacitances file in set_load file format. For example:

pt_shell> source capacitance.pt

read_parasitics: This command is utilized by PT to back-annotate theparasitics in DSPF, RSPF and SPEF formats. You do not need to specifythe format of the file. PT automatically detects it. For example:

pt_shell> read_parasitics clock_info.spf

13.6.2 Post-Layout Clock Specification

Similar to pre-layout, the post-layout timing analysis uses the samecommands, except that this time the clock is propagated through the entireclock network. This is because the clock network now comprises of the clock

f)

Page 295: Advanced ASIC Chip Synthesis - Bhatnagar

280 Chapter 13

tree buffers. Thus the clock latency and skew is dependent on these buffers.Therefore, fixing the clock latency and transition to a specified value is notrequired for post-route clock specification. The following commandsexemplify the post-route clock specification.

pt_shell> create_clock –period 20 –waveform [list 0 10] [list CLK]

pt_shell> set_propagated_clock [get_clocks CLK]

As the name suggests, the set_propagated_clock command propagates theclock throughout the clock network. Since the clock tree information is nowpresent in the design, the delay, skew, and the transition time of the clock iscalculated by PT, from the gates comprising the clock network.

13.6.3 Timing Analysis

Predominantly, the timing of the design is dependent upon clock latency andskew i.e., the clock is the reference for all other signals in the design. It istherefore prudent to perform the clock skew analysis before attempting toanalyze the whole design. A useful Tcl script is provided by Synopsysthrough their on-line support on the web, called SolvNET. You maydownload this script and run the analysis before proceeding. If the Tcl scriptis not available, then designers may write their own script, to generate areport for the clock delay starting from the source point of the clock andending at all the endpoints. The clock skew and total delay may bedetermined by parsing the generated report.

Although setting the clock uncertainty for post-layout STA is not needed,some designers prefer to specify a small amount of clock uncertainty, inorder to produce a robust design.

Let us assume that the clock latency and skew is within limits. The next stepis to perform the static timing on the design, in order to check the setup andhold-time violations. The setup-time analysis is similar to that performed forpre-layout, the only difference being the clock specification (propagate theclock) as described before. In addition, during post-route STA, the extractedinformation from the layout database is back annotated to the design.

Page 296: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 281

The following script illustrates the process of performing post-route setup-time STA on a design. The items in bold reflect the differences between thepre and post-layout timing analysis.

PT script for post-layout setup-time STA

# Define the design and read the netlist onlyset active_design <design name>

read_db –netlist_only $active_design.db

# or use the following command to read the Verilog netlist.# read_verilog $active_design.v

current_design $active_design

set_wire_load_model <wire-load model name>set_wire_load_mode < top | enclosed | segmented >

# Use worst-case operating conditions for setup-time analysisset_operating_conditions <worst-case operating conditions>

# Assuming the 50pf load requirement for all outputsset_load 50.0 [all_outputs]

# Back annotate the worst-case (extracted) layout information.source capacitance_wrst.pt #actual parasitic capacitancesread_sdf rc_delays_wrst.sdf #actual RC delaysread_parasitics clock_info_wrst.spf #clock network data

# Assuming the clock name is CLK with a period of 30ns.# The latency and transition are frozen to approximate the# post-routed values. A small value of clock uncertainty is# used for the setup-time.

create_clock –period 30 –waveform [0 15] CLKset_propagated_clock [get_clocks CLK]

Page 297: Advanced ASIC Chip Synthesis - Bhatnagar

282 Chapter 13

set_clock_uncertainty 0.5 –setup [get_clocks CLK]

# The input and output delay constraint values are assumed# to be derived from the design specifications.

set_input_delay 15.0 –clock CLK [all_inputs]set_output_delay 10.0 –clock CLK [all_outputs]

# Assuming a Tcl variable TESTMODE has been defined.# This variable is used to switch between the normal-mode and# the test-mode for static timing analysis. Case analysis for# normal-mode is enabled when TESTMODE = 1, while# case analysis for test-mode is enabled when TESTMODE = 0.# The bist_mode signal is used from the example illustrated in# Figure 13-3.

set TESTMODE [getenv TESTMODE]

if {$TESTMODE==1} {set_case_analysis 1 [get_port bist_mode]

} else {set_case_analysis 0 [get_port bist_mode]

}

# The following command determines the overall health# of the design.

report_constraint –all_violators

# Extensive analysis is performed using the following commandsreport_timing –to [all_registers –data_pins]report_timing –to [all_outputs]

As mentioned earlier, the design is analyzed for hold-time violations usingthe best-case operating conditions. The following script summarizes all theinformation provided above and may be used to perform the post-route hold-time STA on a design. The items in bold reflect the differences between thesetup-time and the hold-time analysis.

Page 298: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 283

PT script for post-layout hold-time STA

# Define the design and read the netlist onlyset active_design <design name>

read_db –netlist_only $active_design.db

# or use the following command to read the Verilog netlist.# read_verilog $active_design.v

current_design $active_design

set_wire_load_model <wire-load model name>set_wire_load_mode < top | enclosed | segmented >

# Use best-case operating conditions for hold-time analysisset_operating_conditions <best-case operating conditions>

# Assuming the 50pf load requirement for all outputsset_load 50.0 [all_outputs]

# Back annotate the best-case (extracted) layout information.source capacitance_best.pt #actual parasitic capacitancesread_sdf rc_delays_best.sdf #actual RC delaysread_parasitics clock_info_best.spf #clock network data

# Assuming the clock name is CLK with a period of 30ns.# The latency and transition are frozen to approximate the# post-routed values.

create_clock –period 30 –waveform [0 15] CLKset_propagated_clock [get_clocks CLK]set_clock_uncertainty 0.2 –hold [get_clocks CLK]

# The input and output delay constraint values are assumed# to be derived from the design specifications.

set_input_delay 15.0–clock CLK [all_3inputs]set_output_delay 10.0–clock CLK [all_outputs]

Page 299: Advanced ASIC Chip Synthesis - Bhatnagar

284 Chapter 13

# Assuming a Tcl variable TESTMODE has been defined.# This variable is used to switch between the normal-mode and# the test-mode for static timing analysis. Case analysis for# normal-mode is enabled when TESTMODE = 1, while# case analysis for test-mode is enabled when TESTMODE = 0.# The bist_mode signal is used from the example illustrated in# Figure 13-3.

set TESTMODE [getenv TESTMODE]

if {$TESTMODE==1} {set_case_analysis 1 [get_port bist_mode]

} else {set_case_analysis 0 [get_port bist_mode]

}

# The following command determines the overall health# of the design.

report_constraint –all_violators

# Extensive analysis is performed using the following commands.report_timing –to [all_registers –data_pins] \

–delay_type min

report_timing –to [all_outputs] –delay_type min

13.7 Analyzing Reports

The following sub-sections illustrate the timing report generated by thereport_timing command, both for pre-layout and post-layout analysis. Aclock period of 30ns is assumed for the clock named tck of an exampletap_controller design.

Page 300: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 285

13.7.1 Pre-Layout Setup-Time Analysis Report

Example 13.1 illustrates the STA report generated during the pre-layoutphase. The ideal setting for the clock is assumed using the pre-layout clockspecification commands.

The following command was used to instruct PT to display a timing reportfor the worst path (maximum delay), starting at the input port tdi and endingat the input pin of a flip-flop.

pt_shell> report_timing –from tdi –to [all_registers –data_pins]

The default settings were used i.e., the –delay_type option was notspecified, therefore PT performs the setup-time analysis on the design byassuming the max setting for the –delay_type option. Furthermore, PT usesthe default values of –nworst and –max_paths options. This ensures thatthe timing report for a single worst path (minimum slack value) is generated.All other paths starting from the tdi input port and ending at other flip-flopswill have a higher slack value, thus will not be displayed.

Example 13.1

Report : timing–path full–delay max–max_paths 1

Design : tap_controllerVersion : 1998.08–PT2Date : Tue Nov 17 11:16:18 1998

Startpoint: tdi (input port clocked by tck)Endpoint: ir_block/ir_reg0

(rising edge-triggered flip-flop clocked by tck)Path Group: tckPath Type: max

Page 301: Advanced ASIC Chip Synthesis - Bhatnagar

286 Chapter 13

Point Incr Path

clock tck (rise edge) 0.00 0.00clock network delay (ideal) 0.00 0.00input external delay 15.00 15.00 rtdi (in) 0.00 15.00 rpads/tdi (pads) 0.00 15.00 rpads/tdi_pad/Z (PAD1X) 1.32 16.32 rpads/tdi_signal (pads) 0.00 16.32 rir_block/tdi (ir_block) 0.00 16.32 rir_block/Ul/Z (AND2D4) 0.28 16.60 rir_block/U2/ZN (INV0D2) 0.33 16.93 fir_block/U1234/Z (OR2D0) 1.82 18.75 fir_block/U156/ZN(NOR3D2) 1.05 19.80 rir_block/ir_reg0/D (DFF1X) 0.00 19.80 rdata arrival time 19.80

clock tck (rise edge) 30.00 30.00clock network delay (ideal) 2.50 32.50ir_block/ir_reg0/CP (DFF1X) 32.50 rlibrary setup time –0.76 31.74data required time 31.74

data required time 31.74data arrival time –19.80

slack (MET) 11.94

It is clear from the above report that the design meets the required setup-timewith a slack value of 11.94ns. This means that there is a margin of at least11.94ns before the setup-time of the endpoint flop is violated.

13.7.2 Pre-Layout Hold-Time Analysis Report

Example 13.2 illustrates the STA report generated during the pre-layoutphase. The ideal setting for the clock is assumed using the pre-layout clockspecification commands.

-----------------------------------------------------------------

-----------------------------------------------------------------

-----------------------------------------------------------------

Page 302: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 287

In order to perform hold-time STA, the following command was used toinstruct PT to display a timing report for a minimum delay path, existingbetween two flip-flops.

pt_shell> report_timing–from [all_registers–clock_pins] \–to [all_registers –data_pins] \–delay_type min

In the above case, the –delay_type option was specified with min value,thus informing PT to display the best-case timing report. The default valuesof all other options were maintained.

Example 13.2

Report : timing_path full–delay min–max_paths 1

Design : tap_controllerVersion : 1998.08–PT2Date : Tue Nov 17 11:16:18 1998

Startpoint: state_block/st_reg9(rising edge-triggered flip-flop clocked by tck)

Endpoint: state_block/bp_reg2(rising edge-triggered flip-flop clocked by tck)

Path Group: tckPath Type: min

Point Incr Path

clock tck (rise edge) 0.00 0.00clock network delay (ideal) 2.50 2.50state_block/st_reg9/CP (DFF1X) 0.00 2.50 rstate_block/st_reg9/Q (DFF1X) 0.05 2.55 r

Page 303: Advanced ASIC Chip Synthesis - Bhatnagar

288 Chapter 13

state_block/U15/Z(BUFF4X) 0.15 2.70 rstate_block/bp_reg2/D (DFF1X) 0.10 2.80 rdata arrival time 2.80

clock tck (rise edge) 0.00 0.00clock network delay (ideal) 2.50 2.50state_block/bp_reg2/CP (DFF1X) 2.50 rlibrary hold time 0.50 3.00data required time 3.00

data required time 3.00data arrival time –2.80

slack (VIOLATED) –0.20

A negative slack value in the above report implies that the hold-time of theendpoint flop is violated by 0.20ns. This is due to the data arriving too fastwith respect to the clock.

To fix the hold-time for the above path, the setup-time analysis should alsobe performed on the same path in order to find the overall slack margin.Doing this provides a time frame in which the data can be manipulated.

For the above example, if the setup-time slack value is large (say, 10ns) thenthe data can be delayed by 0.20ns or more (say 1ns), thus providing amplehold-time at the endpoint flop. However, if the setup-time slack value is less(say 0.50ns) then a very narrow margin of 0.30ns (0.50ns – 0.20ns) exists.Delaying the data by an exact amount of 0.20ns will produce the desiredresults, leaving 0.30ns as the setup-time. However, the minute time windowof 0.30ns makes it extremely difficult for designers to fix the timing violation– delay the data just enough, so that it does not violate the setup-timerequirements. In this case, the logic may need to be re-synthesized and theviolating path targeted for further optimization.

Page 304: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 289

13.7.3 Post-Layout Setup-Time Analysis Report

The same command that is used for pre-layout setup-time STA also performsthe post-layout setup-time analysis. However, the report generated is slightlydifferent, in the sense that PT uses asterisks to denote the delays that are backannotated.

Example 13.3 illustrates the post-layout timing report generated by PT toperform the setup-time STA. The same path segment shown in Example 13.1(the pre-layout setup-time STA) is targeted to demonstrate the differencesbetween the pre-layout and the post-layout timing reports.

Example 13.3

Report : timing–path full–delay max–max_paths 1

Design : tap_controllerVersion : 1998.08–PT2Date : Wed Nov 18 12:14:18 1998

Startpoint: tdi (input port clocked by tck)Endpoint: ir_block/ir_reg0

(rising edge-triggered flip-flop clocked by tck)Path Group: tckPath Type: max

Point Incr Path

clock tck (rise edge) 0.00 0.00clock network delay (propagated) 0.00 0.00input external delay 15.00 15.00 rtdi (in) 0.00 15.00 rpads/tdi (pads) 0.00 15.00 rpads/tdLpad/Z (PAD1X) 1.30 16.30 r

Page 305: Advanced ASIC Chip Synthesis - Bhatnagar

290 Chapter 13

pads/tdi_signal (pads) 0.00 16.30 rir_block/tdi (ir_block) 0.00 16.30 rir_block/Ul/Z (AND2D4) 0.22* 16.52 rir_block/U2/ZN (INV0D2) 0.24* 16.76 fir_block/U1234/Z (OR2D0) 0.56* 17.32 fir_block/U156/ZN(NOR3D2) 0.83* 18.15 rir_block/ir_reg0/D (DFF1X) 1.03* 19.18 rdata arrival time 19.18

clock tck (rise edge) 30.00 30.00clock network delay (propagated) 2.00 32.00ir_block/ir_reg0/CP (DFF1X) 32.00 rlibrary setup time –0.76 31.24data required time 31.24

data required time 31.24data arrival time –19.18

slack (MET) 12.06

By comparison, the post-layout timing results improve from a slack value of11.94 (in Example 13.1) to 12.06. This variation is attributed to thedifference between the wire-load models used during pre-layout STA and theactual extracted back-annotated data from the layout. In this case, the wire-load models are slightly pessimistic as compared to the post-routed results.

Another difference between the pre-layout and the post-layout results is thepropagation of the clock. In the pre-layout timing report, an ideal clock wasassumed. However, during the post-layout STA the clock is propagated,thereby accounting for real delays. This is shown in the above report as“clock network delay (propagated)”.

In the pre-layout phase, an ideal clock network delay of 2.5ns was assumed.The post-route STA results indicate that the clock is actually faster thanpreviously estimated, i.e., the clock network delay value is 2.0ns instead of2.5ns. This provides an indication to the post-routed clock network delayvalues. Therefore, the next time (next iteration, maybe) the design is

Page 306: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 291

analyzed in the pre-route phase, the clock network delay value of 2.0nsshould be used to provide a closer approximation to the post-routed results.

13.7.4 Post-Layout Hold-Time Analysis Report

The same command that is used for pre-layout hold-time STA also performsthe post-layout hold-time analysis. However, the report generated is slightlydifferent, in the sense that PT uses asterisks to denote the delays that are backannotated. In addition, the clock network delay is propagated instead ofassuming ideal delays.

Example 13.4 illustrates the post-layout timing report generated by PT toperform the setup-time STA. The same path segment shown in Example 13.2(the pre-layout hold-time STA) is targeted to demonstrate the differencesbetween the pre-layout and the post-layout timing reports.

Example 13.4

Report : timing–path full–delay min–max_paths 1

Design : tap_controllerVersion : 1998.08-PT2Date : Tue Nov 17 11:16:18 1998

Startpoint: state_block/st_reg9(rising edge-triggered flip-flop clocked by tck)

Endpoint: state_block/bp_reg2(rising edge-triggered flip-flop clocked by tck)

Path Group: tckPath Type: min

Page 307: Advanced ASIC Chip Synthesis - Bhatnagar

292 Chapter 13

Point Incr Path

clock tck (rise edge)clock network delay (propagated)state_block/st_reg9/CP (DFF1X)state_block/st_reg9/Q (DFF1X)state_block/U15/Z (BUFF4X)state_block/bp_reg2/D (DFF1X)data arrival time

clock tck (rise edge)clock network delay (propagated)state_block/bp_reg2/CP (DFF1X)library hold timedata required time

0.001.920.000.180.04*0.06*

0.001.54

0.50

0.001.921.92 r2.10 r2.14 r2.20 r2.20

0.001.541.54 r2.042.04

data required timedata arrival time

2.04–2.20

slack (MET) 0.16

In the above case, the hold-time for the endpoint flop is met with a margin of0.16ns to spare. Notice the difference in clock latency between the startpointflop (1.92ns) and the endpoint flop (1.54ns). The difference in latency givesrise to the clock skew. Generally, a small clock skew value is acceptable,however a large clock skew may result in race conditions within the design.The race conditions cause the wrong data to be clocked by the endpoint flop.Therefore, it is advisable to minimize the clock skew in order to avoid suchproblems.

13.8 Advanced Analysis

This section provides an insight to the designer to perform advanced STA onthe design. Depending upon the situation, designers may analyze the designin detail, utilizing the concepts and techniques described in the followingsections.

Page 308: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 293

13.8.1 Detailed Timing Report

Often, in a design a path segment may fail setup and/or hold-time and itbecomes necessary to analyze the design closely, in order to find the cause ofthe problem.

Consider the timing report shown in Example 13.2. The hold-time is failingby 0.20ns. In order to find the cause of the problem, the following commandwas used:

pt_shell>report_timing –from state_block/st_reg9/CP \–to state_block/bp_reg2/D \–delay_type min \–nets –capacitance –transition_time

In the above command, additional options namely, –nets, –capacitance and–transition–time are used. Although the above command uses all threeoptions concurrently, these options may also be used independently.

The timing report shown in Example 13.5 is identical to the one shown inExample 13.2, except that it uses the above command to produce the timingreport that includes additional information on the fanout, load capacitanceand the transition time.

Example 13.5

Report : timing–path full–delay min–max_paths 1

Design : tap_controllerVersion : 1998.08–PT2Date : Tue Nov 17 11:16:18 1998

Page 309: Advanced ASIC Chip Synthesis - Bhatnagar

294 Chapter 13

Startpoint: state_block/st_reg9(rising edge-triggered flip-flop clocked by tck)

Endpoint: state_block/bp_reg2(rising edge-triggered flip-flop clocked by tck)

Path Group: tckPath Type: min

Point Fanout Cap Trans Incr Path

clock tck (rise edge)clock network delay (ideal)state_block/st_reg9/CP (DFF1X)state_block/st_reg9/Q (DFF1X)state_block/n1234 (net) 2state_block/U15/Z (BUFF4X)state_block/n2345 (net) 8state_block/bp_reg2/D (DFF1X)data arrival time

clock tck (rise edge)clock network delay (ideal)state_block/bp_reg2/CP (DFF1X)library hold timedata required time

0.04

2.08

0.30

0.300.12

0.32

0.41

0.30

0.002.500.000.05

0.15

0.10

0.002.50

0.50

0.002.502.50 r2.55 r

2.70 r

2.80 r2.80

0.002.502.50 r3.003.00

data required timedata arrival time

3.00–2.80

slack (VIOLATED) –0.20

By analyzing the timing report shown in Example 13.5, it can be seen thatthe cell U15 (BUFF4X) has a fanout of 8, with a load capacitance of 2.08pf.The computed cell delay is 0.15ns. As stated before, the hold-time violationis fixed by delaying the data with respect to the clock. Therefore, if the drivestrength of the cell U15 is reduced from 4X to 1X, it will result in anincreased delay value for the cell U15, due to the increase in transition time.This increase in delay value will contribute towards slowing the entire data

Page 310: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 295

path, thus removing the hold-time violation. The resulting timing report isshown in Example 13.6.

Example 13.6

Report : timing–path full–delay min–max_paths 1

Design : tap_controllerVersion : 1998.08–PT2Date : Tue Nov 17 11:16:18 1998

Startpoint: state_block/st_reg9(rising edge-triggered flip-flop clocked by tck)

Endpoint: state_block/bp_reg2(rising edge-triggered flip-flop clocked by tck)

Path Group: tckPath Type: min

Point Fanout Cap Trans Incr Path

clock tck (rise edge)clock network delay (ideal)state_block/st_reg9/CP (DFF1X)state_block/st_reg9/Q (DFF1X)state_block/n1234 (net) 2state_block/U15/Z (BUFF1X)state_block/n2345 (net) 8state_block/bp_reg2/D (DFF1X)data arrival time

clock tck (rise edge)clock network delay (ideal)state_block/bp_reg2/CP (DFF1X)library hold time

0.04

2.08

0.30

0.300.12

1.24

1.25

0.30

0.002.500.000.05

0.40

0.10

0.002.50

0.50

0.002.502.50 r2.55 r

2.95 r

3.05 r3.05

0.002.502.50 r3.00

Page 311: Advanced ASIC Chip Synthesis - Bhatnagar

296 Chapter 13

data required time 3.00

data required timedata arrival time

3.00–3.05

slack (MET) 0.05

In the timing report shown above, by reducing the drive strength of the cellU15 from 4X to 1X, an increase in transition time, and therefore an increasein the incremental delay of the gate is achieved. This impacts the overall datapath, which results in a positive slack margin of 0.05ns, thus removing thehold-time violation for the endpoint flop.

13.8.2 Cell Swapping

PT allows the ability to swap cells in the design, as long as the pinout of theexisting cell is identical to the pinout of the replacement cell. This capabilityallows designers to perform what-if scenarios without leaving the pt_shellsession.

In Example 13.6, the cell BUFF1X (having a lower drive strength andidentical pinout to BUFF4X) replaced the cell BUFF4X. However, theprocess of replacement was not discussed. There are two methods to achievethis. The netlist could be modified manually before performing STA; ordesigners may use the cell swapping capability of PT to perform the what-ifSTA scenarios on the violating path segments, before manually modifyingthe netlist.

Manually modifying the netlist before performing the what-if scenarios iscertainly a viable approach. However, it is laborious. For the case shown inExample 13.6, first the pt_shell session is terminated, then the netlistmodified manually (BUFF4X replaced with BUFF1X), and finally thept_shell session is invoked again to re-analyze the previously violating pathsegment (from st_reg9 to bp_reg2). If the modifications to the netlist do notproduce the desired results (the path segment is still violating timing) thenthe whole process needs to be repeated. This approach is certainly tediousand wasteful.

Page 312: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 297

A preferred alternative is to use the following command to replace theexisting cell with another:

pt_shell> swap_cell {U15} [get_lib_cell stdcell_lib/BUFF1X]

The above command can be used within the pt_shell session to replace theexisting cell BUFF4X (instanced as U15), with BUFFIX (from the“stdcell_lib” technology library), and the path segment re-analyzed to viewthe effect of the swapping. This provides a faster approach to debugging thedesign and visualizing the effect of cell swapping without terminating thept_shell session.

Note that the cell swapping only occurs inside the PT memory. The physicalnetlist remains unmodified. If the path segment and the rest of the designpasses STA, this modification should be incorporated in the netlist bymodifying the netlist manually.

13.8.3 Bottleneck Analysis

Sometimes a design may contain multiple path segments that share acommon leaf cell. If these path segments are failing timing then changing thedrive strength (sizing it up or down) of the common leaf cell may remove thetiming violation for all the path segments. PT provides the capability ofidentifying a common leaf cell that is shared by multiple violating pathsegments in a design. This is termed as bottleneck analysis and is performedby using the report_bottleneck command.

In Example 13.2, a hold-time violation exists for the path segment startingfrom state_block/st_reg9 and ending at state_block/bp_reg2. However, thehold-time violation also exists (shown in Example 13.7) for the path segmentstarting from the same startpoint (state_block/st_reg9) but ending at adifferent endpoint, state_block/enc_reg0.

Page 313: Advanced ASIC Chip Synthesis - Bhatnagar

298 Chapter 13

Example 13.7

Report : timing–path full–delay min–max_paths 1

Design : tap_controllerVersion : 1998.08–PT2Date : Tue Nov 17 11:24:10 1998

Startpoint: state_block/st_reg9(rising edge-triggered flip-flop clocked by tck)

Endpoint: state_block/enc_reg0(rising edge-triggered flip-flop clocked by tck)

Path Group: tckPath Type: min

Point Incr Path

clock tck (rise edge)clock network delay (ideal)state_block/st_reg9/CP (DFF1X)state_block/st_reg9/Q (DFF1X)state_block/U15/Z (BUFF4X)state_block/enc_reg0/D (DFF1X)data arrival time

clock tck (rise edge)clock network delay (ideal)state_block/enc_reg0/CP (DFF1X)library hold timedata required time

0.002.500.000.050.150.07

0.002.50

0.50

0.002.502.50 r2.55 r2.70 r2.77 r2.77

0.002.502.50 r3.003.00

data required timedata arrival time

3.00–2.77

slack (VIOLATED) –0.23

Page 314: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 299

In the timing report shown above, the hold-time violation is 0.23ns. Visualinspection of the two timing reports (Example 13.2 and 13.7) reveal that asingle cell BUFF4X (instanced as U15) is common to both path segments

and Thus, reducing the drivestrength of this cell may eliminate the hold-time violation for both the pathsegments.

However, this process involves careful visual inspection of all the pathsegments in the design in an effort to identify the common leaf cell betweenthe startpoint and the endpoint of all the violating path segments. Thismethod can be extremely tedious for a large number of path segments.

The recommended method of identifying a common leaf cell between thestartpoint and the endpoint of all the violating path segments is to performthe bottleneck analysis. For the above case (in Example 13.7), the followingcommand was used to identify the common leaf cell shared by the violatingpath segments.

pt_shell> report_bottleneck

Example 13.8 illustrates a report that was generated by PT, identifying thecell U15 (BUFF4X) as the common leaf cell shared by the two path segmentsmentioned above.

Example 13.8

Report : bottleneck–cost_type path_count–inax_cells 20–nworst_paths 100

Design : tap_controllerVersion : 1998.08–PT2Date : Tue Nov 17 12:09:09 1998

Page 315: Advanced ASIC Chip Synthesis - Bhatnagar

300 Chapter 13

Bottleneck Cost = Number of violating paths through cell

Cell ReferenceBottleneckCost

U15 BUFF4X 2.00

Once the cell has been identified, it can be swapped with another in order tofix the timing violation of multiple path segments. Once again, a completeSTA should be performed on the entire design. Any required changes (due tocell swapping etc.) should be manually incorporated in the final netlist.

13.8.4 Clock Gating Checks

Usually, low power designs contain clocks that are enabled by the gatinglogic, only when needed. For such designs, the cell used for gating the clockshould be analyzed for setup and hold-time violations, in order to avoidclipping of the clock.

The setup and hold-time requirements may be specified through theset_clock_gating_check command explained in Chapter 12. For example:

pt_shell> set_clock_gating_check–setup 0.5 –hold 0.02 tck

Example 13.9 illustrates the clock gating report that utilized the setup andhold-time requirements specified above for the gated clock, “tck”. Thefollowing command was used to generate the report:

pt_shell> report_constraint –clock_gating_setup \–clock_gating_hold \–alL violators

Example 13.9

Report : constraint–all_violators

Page 316: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 301

–path slack_only–clock_gating_setup–clock_gating_hold

Design : tap_controllerVersion : 1998.08–PT2Date : Tue Nov 17 12:30:07 1998

clock_gating_setup

Endpoint Slack

state_block/U1789/Alstate_block/U1346/Al

clock_gating_hold

Endpoint

–1.02 (VIOLATED)–0.98 (VIOLATED)

Slack

state_block/U1789/Alstate_block/U1450/Al

–0.10 (VIOLATED)–0.02 (VIOLATED)

It is important to note that the –all_violators option should be used inaddition to the –clock_gating_setup and the –clock_gating_hold options.Failure to include the –all_violators option will result in a report displayingonly the cost function of the failures, instead of the identifying the failedgates.

The –verbose option may also be included to display a full path report forthe purpose of debugging the cause of the violation, and how it may becorrected. Example 13.10 illustrates one such report that was generated byusing the following command:

pt_shell> report_constraint –clock_gating_hold \–all_violators \–verbose

Page 317: Advanced ASIC Chip Synthesis - Bhatnagar

302 Chapter 13

Example 13.10

Report : constraint–all_violators–path slack_only–clock_gating_hold

Design tap_controllerVersion 1998.08–PT2Date Tue Nov 17 12:32:10 1998

Startpoint: state_block/tst_reg11(rising edge-triggered flip-flop clocked by tck)

Endpoint: state_block/U1789(rising clock gating-check end-point clocked by tck)

Path Group: **clock_gating_default**Path Type: min

Point Incr Path

clock tck (rise edge)clock network delay (propagated)state_block/tst_reg11/CP (DFF1X)state_block/tst_reg11/Q (DFF1X)state_block/U1789/A2 (AND4X)data arrival time

clock tck (rise edge)clock network delay (propagated)state_block/U1789/Al (AND4X)clock gating hold timedata required time

0.002.250.000.05*0.12*

0.002.50

0.02

0.002.252.25 r2.30 r2.42 r2.42

0.002.502.50 r2.522.52

data required timedata arrival time

2.52–2.42

slack (VIOLATED) –0.10

Page 318: Advanced ASIC Chip Synthesis - Bhatnagar

STATIC TIMING ANALYSIS 303

In the above example, the AND0X gate is used to gate the clock, “tck”. PinA2 of this cell is connected to the enabling signal, whereas the clock drivespin Al of this cell. As can be seen from the report, the hold-time is beingviolated by the gating logic. In order to fix the hold-time violation, the cellANDOX may be sized down to slow the data path.

13.9 Chapter Summary

Static timing is key to success, with working silicon as final product. Statictiming not only verifies the design for timing, but also checks all the pathsegments in the design. This chapter covers all the steps necessary to analyzea design comprehensively through static timing analysis.

The chapter started by comparing static timing analysis to the dynamicsimulation method, as the tool for timing verification. It was recommendedthat the former method be used as an alternative to dynamic simulationapproach. This was followed by a detailed discussion on timing exceptions,which included multicycle and false paths. Helpful hints were provided toguide the user in choosing the best approach.

A separate section was devoted to disabling the timing arcs of cells and toperform case analysis. The case analysis was recommended over individualdisabling of timing arcs, for designs containing many cells with timing arcsthat are related to a common signal. An example case of DFT logic wasprovided as an application of case analysis.

In addition, the process of analyzing designs both for pre-layout and post-layout was covered in detail, which included clock specification and timinganalysis.

Finally, a comprehensive section was devoted to timing reports followed byadvanced analysis of the timing reports. At each step, example reports wereprovided and explained in detail.

Page 319: Advanced ASIC Chip Synthesis - Bhatnagar

Appendix A

Page 320: Advanced ASIC Chip Synthesis - Bhatnagar

Appendix A

A New Timing Closure Methodologyusing Physical Compiler

Presented at Boston SNUG 2001 by:

Himanshu BhatnagarConexant Systems, Inc.

ABSTRACT

Business success is increasingly dependent upon the ability of developmentteams to deliver a shortest “time-to-market” product that meets customerrequirements. To mitigate the “time-to-market” pressure, several tools and

Page 321: Advanced ASIC Chip Synthesis - Bhatnagar

307

techniques have evolved over the past few years. As design complexitiesincrease along with the need for higher speeds, timing convergence is one ofthe biggest challenges faced by the design teams. This promotes “timingclosure” to the forefront of the design cycle.

Due to added design complexity, testing the device has also become one themajor issues. Designers now must not only code the design for accuratefunctionality, but also code with “test” in mind.

This paper explores new techniques using Physical Compiler that may beused in order to achieve early timing closure. The paper also describes thedesign-for-test (or DFT) scan insertion and reordering using PhysicalCompiler.

INTRODUCTION

Timing closure is one of the biggest forces driving the EDA vendors today.Designs ready to be synthesized get stuck in an infinite loop of synthesis andlayout in order to converge on timing. The situation is further aggravated byseparation of front-end and back-end tools. The separation enforced by thisartificial wall between the front-end and the back-end causes the designengineer and the layout engineer “in-effect” to talk two different languages.The bottleneck occurs when the netlist is thrown over the wall to the layoutengineer. The layout engineer who is responsible for floorplanning,placement and routing may not have intricate idea of the complexityassociated with the design. To make matters even worse, different delaycalculators of each tool (synthesis and layout) gives varying results, addingto timing convergence problem.

To alleviate this issue, Synopsys introduced Physical Compiler (or PhyC).This tool sits between the front-end synthesis and the back-end layout tools.The idea is to use the same timing constraints, libraries etc. and perform cellplacement of the design. Instead of relying on wire-load models, the tool usesSteiner route as basis for calculating cell delays. This method provides amore accurate delay computation.

Page 322: Advanced ASIC Chip Synthesis - Bhatnagar

308

This paper will explains a new method of using PhyC (version 2000.11-SP1)in order to achieve an early timing closure. In addition, this paper alsodetails, scan insertion techniques as well as some shortcomings of PhyC.

DESIGN FLOW IN A NUTSHELL

In order to reduce the design cycle starting from RTL coding to final tape-out, we analyzed each step in the design cycle and then devised ways toreduce each step.

The following is a broad outline of a design cycle.

RTL codingRTL verificationOne-pass synthesis with scan insertionFloorplanningPlacementClock tree insertionRoutingStatic timing analysis

The design cycle step of RTL coding and simulation requires a lot ofpatience, designer intervention and tremendous amount of verification. Usingsophisticated techniques like formal verification, state-of-the-art simulatorsrunning on faster machines with ample memory may certainly reduce cycletime. However, the process of coding the design along with verification isnot automatic. The rest of the steps can be automated to reduce the time-to-market.

In the past, Synopsys and other users have provided various guidelines inorder to automate the synthesis process. These methods provide acomprehensive solution of synthesizing and timing the design. Although notperfect, due to their dependence on wire-load models, they still provide anadequately optimized netlist.

DFT scan insertion is another major bottleneck that requires designers’intervention. Not only do designers have to code for accurate functionality

Page 323: Advanced ASIC Chip Synthesis - Bhatnagar

309

they must also comply with DFT rules while coding the design. In the pastscan insertion was performed on the synthesized netlist. However, due toincreasing complexity of the design and problems in timing convergence, thedesigns are being made DFT rule friendly in the source RTL itself.

After synthesis the design is thrown over to wall to the layout engineer whereit enters the floorplanning step. In general, this stage suffers the most mainlybecause of the lack of understanding of the design by the layout engineer. Ifthe floorplan is does not provide a good starting point, it can cause routingproblems due to congestion. This severely impacts the timing of the design.

KEY TO ACHIEVING EARLY TIMING CLOSURE

In order to reduce time, we identified two areas where design cycle time canbe reduced tremendously.

Make RTL, DFT rule compliant early in the design cyclePerform timing driven floorplanning

Making designs DFT rule compliant (reducing step3 in section 2.0)

Recently, Synopsys introduced a much-needed command called “rtldrc”.This command runs on the source RTL and identifies problematic test areas,which may violate DFT rules. Usage of this command greatly simplifies theburden of knowing all the DFT rules by design engineers. With thiscommand, designers can simply code the design for functionality and thenrun it through rtldrc. If violations occur, they then modify the designimmediately before proceeding further. By doing this, any surprises detectedat the end (which may have required RTL changes and re-synthesis) arecompletely eliminated. This benefit of using this method is that it saves atremendous amount of time by reducing the iterations involved. In otherwords, the need to change the synthesized netlist in order to make it DFTfriendly is removed.

Page 324: Advanced ASIC Chip Synthesis - Bhatnagar

310

The following is an example of running rtldrc:

dc_shell -t> analyze –f verilog my_design.vdc_shell -t> elaborate my_designdc_shell -t> create_test_clock –p 100 –w {45 55} my_clkdc_shell-t> set_test_hold 1 my_test_modedc_shell -t> set_signal_type test_asynch_inverted my_resetdc_shell-t> rtldrc

After the design is rtldrc clean, you may proceed with full one-pass synthesisand produce a fully optimized netlist, which is scan ready.

Timing driven floorplan (reducing step4 in section 2.0)

The traditional approach to floorplanning consists of defining the chip areawith macro (such as RAM/ROMs etc) placement along with routing powerand ground straps by hand. Layout engineers create the floorplan based onthe following:

1.2.

Connectivity using fly lines in the layout toolDesigners suggestions on block/macro placement

Both of these approaches may not yield optimal timing results. Thedesigner’s view of the logic blocks connectivity and what the layout tool“sees” can differ dramatically. If the floorplan is not optimal, the poor qualityof result in timing can only be realized during post-route static timinganalysis. In other words, the design has to be placed, clock tree inserted andthen routed before the static timing analysis can be performed. If timinganalysis fails, the whole process starts again. This methodology wastesvaluable time.

Upon realizing this, we created a better solution that utilizes PhyC’scapabilities in placing not only the standard cells, but macros also. We usedthe following flow to perform this:

Page 325: Advanced ASIC Chip Synthesis - Bhatnagar

311

1.2.

3.

4.5.6.

7.8.

Defined chip area/aspect ratio and placed the IO pads.Modified the LEF files for macros, so they “look” identical to standardcells.Converted the LEF’s to pdb’s (physical equivalent of logic db’s), usinglef2pdb conversion utilityRan physopt and wrote out the PDEF.Read the PDEF into the layout tool.Made a proper floorplan (power straps, shuffle the macros, addedblockages etc.)Wrote out the new PDEFRan physopt using the new PDEF and placed standard cells only (usedoriginal pdb’s for macros)

Only a single line in the LEF file for macros needs to be modified in order toperform step 2 as shown below:

Original LEF Modified LEF

CLASS RING CLASS CORE

The advantage of using the above flow is that we are using the timing drivenplacement capabilities of PhyC to place both the standard cells and themacros. Once all the cells have been placed, we moved the macros slightly inorder to remove minor overlaps (see note below) while still keeping theirrelative positions. We then proceeded to add power/ground rings aroundmacros along with blockages and power straps before writing out the finalfloorplanned PDEF. In other words, we beautified the layout surface after theall cells have been timing driven placed.

Using the above flow, we were not only able to converge on timing in asingle iteration, but we also significantly reduced the time needed infloorplanning the chip.

Note: While placing macros, PhyC complains that it cannot place multi-row heightcells optimally. It places them, but the placement may not be optimal. We sawsome overlaps, however the relative positioning of macros was excellent.Synopsys informed us that this ability of PhyC would be available in theupcoming 2001.08 release.

Page 326: Advanced ASIC Chip Synthesis - Bhatnagar

312

SCAN CHAIN ORDERING

The advantages of scan chain ordering are enormous. However, usually withevery good thing there is also something bad associated with it.

The following are some of the benefits of scan chain reordering:1.2.3.4.

5.6.7.

Reduces congestion, thus improves timingLess overall area (net length is dramatically reduced)Improves setup time of functional paths due to decreased flop loadingReduces negative hold-times (mainly a simulation vs. static timinganalysis issue)Improves timing due to less overall capacitanceImproves power consumption by driving less net capacitanceBetter clock tree (lower latency and fewer buffers), thus improvingtiming along with low power consumption.

The disadvantages are:1.2.

Increases the chance of hold-time violations in scan-pathAdditional runtime in the design cycle.

Using Clock Skew to reduce Hold Time Violations

With the advantages far outweighing the disadvantages, it seems that scanchain ordering is a must. However, what do we do with the increased hold-time violations? One alternative is to insert a perfectly balanced zero-skewclock tree. Generally, the ASIC community prefers this approach and tries toshoot for the minimum skew clock tree. The hope is that the Clock-to-Qdelay of flops + additional RC delay of the wire, will exceed the clock skew,thus preventing hold-time violations. In general this approach works bestand eliminates most of the hold-time problems. The remaining hold-timeproblem areas can be individually targeted by adding delay cells along thescan path after the zero-skew clock tree insertion.

The other alternative to minimizing the hold-time violations is to ignore theclock skew completely. The motto here is -- “SKEW IS GOOD”. There are

Page 327: Advanced ASIC Chip Synthesis - Bhatnagar

313

two reasons for this. First, skew may actually prevent hold time problems.Secondly, skew reduces the power spikes caused by all clock buffersswitching at the same time. Of course, it depends on the design speed on howmuch skew can be tolerated by the design. Too much skew may lead to setuptime issues.

The way the placement tool works is that the flops are generally clusteredtogether in concentric circles (figure 1). When the clock tree is inserted, theclock buffers are placed inside each circle. This arrangement provides a zeroclock skew for the flops within a cluster. Zero clock skew for a set of flopswithin a cluster means that there cannot be any hold-time violations for theseflops. In other words, the Clock-to-Q delay of flop itself will prevent anychance of hold-time violations.

Page 328: Advanced ASIC Chip Synthesis - Bhatnagar

314

The potential of hold-time violations increases when the data signals crossthe cluster boundaries. A separate clock buffer is driving another cluster andmay lead to clock skew between the first cluster and the second. In this case,two flops will be affected. The last flop in the scan chain belonging to thefirst cluster, and the first flop in the scan chain of the second cluster. Here,the skew can be positive or it may be negative.

Figure 2 illustrates a case of two registers with their clocks arriving atdifferent times. If the arrival time of clock signal at the RegB, is less than thearrival time of clock signal at RegA (negative skew), there cannot be anyhold time issue for RegB. However, if the clock signal at RegB arrives laterthan the arrival time of clock at RegA (positive skew), there is a potential ofhold-time violation.

Skew is unbiased, in other words, it can be both positive and negative. Thus50% of the flops sitting at cluster boundaries will not have hold-timeviolations while the other 50% may experience them. Now consider the delayof the data path from one cluster to the next. If the clusters are spread apart,the RC delay itself may be larger than the clock skew between clusters. Thisplacement arrangement itself will prevent the hold-time issues. This leaves avery small percentage of flops that are susceptible to hold-time problems.These can be targeted individually by inserting delay cells in their scan pathsto fix the hold-time violations.

Note: If you’re going for a minimum skew clock tree, then balanced rise/fall timesclock buffers and inverters greatly reduce the skew. If you’ve gated clocktrees, then make sure that the gating logic also has balanced rise/fall times.

Page 329: Advanced ASIC Chip Synthesis - Bhatnagar

315

insert_scan and Hold-Time

The way insert_scan works is that it tries to find the best scan-chain route inorder to minimize hold-time violations. It evaluates the logic being driven bythe flop and if it finds any gate like buffer or inverter then it will connect thescan chain to the output of this buffer instead of connecting the Q outputdirectly to the scan-in port of the next flop. This is an immense help inminimizing the hold-time violations. The following switch controls thisbehavior:

dc_shell -t> set test_disable_find_best_scan_out false

Page 330: Advanced ASIC Chip Synthesis - Bhatnagar

316

By setting the above variable to “false” insert_scan will try to find the bestscan out. Setting it to true will disable this behavior and it will tap the outputQ of the flop and link it to the scan-in of the next flop. Default value of thisvalue is “false”.

PHYSICAL COMPILER ISSUES

When reading a netlist using read_verilog, PhyC outputs a lot of assignstatements in the final verilog netlist. This only happens whenperforming scan chain ordering, and occurs even after using the hiddenvariable: set physopt_fix_multiple_port_nets true. When reading thedb file and performing the same operations (including using the abovevariable), no assign statements are generated. If the verilog netlist iscompiled into the db format, PhyC still produces assign statements. Theonly method that does not produce the assign statements is when the dbis written out with scan insertion operation done by the DFT compiler.

The heavily promoted “integrated physopt flow” by Synopsys does notwork “as advertised”. The idea is to compile the design using one-passscan synthesis and then run physopt with scan order option. Phyoptis supposed to perform scan stitching, ordering as well as placement.However, no matter what we did, PhyC complained that the design wasnot scan ready. Running check_test came out clean, but physoptkept on complaining that the design was not test-ready. This problemwas solved by the Synopsys AE (along with a solv-net article) who toldus that we have to set an attribute telling PhyC that the design is test-ready (DUH!!). After using this attribute we were able to run physoptsuccessfully.

ACKNOWLEDGEMENTS

Primarily I would like to thank Eric Tan (Conexant) for doing numerouslayouts at my request.

Page 331: Advanced ASIC Chip Synthesis - Bhatnagar

317

I would also like to thank Elisabeth Moseley, Bob Moussavi and KellyConklin from Synopsys who had a lot of patience in answering my questions.Without their help, it would not have been possible for me to write thispaper. Thanks are also due to Leah Clark for her help in refining this paper.

REFERENCES

1.2.

Physical Compiler Users GuideDFT Compiler Users Guide

Page 332: Advanced ASIC Chip Synthesis - Bhatnagar

Appendix B

Example Makefile

###

General Macros

MV=\mv–fRM=\rm–fDC_SHELL=dc_shell

CLEANUP = $(RM) command.log pt_shell_command.log

###

Modules for Synthesis

top:@make $(SYNDB)/top.db

$(SYNDB)/top.db: $(SCRIPT)/top.scr \$(SRC)/top.v \$(SYNDB)/A.db \

Page 333: Advanced ASIC Chip Synthesis - Bhatnagar

320

$(SYNDB)/B.db$(DC_SHELL) –f $(SCRIPT)/top.scr | tee top.log$(MV) top.log $(LOG)$(MV) top.sv $(NETLIST)$(MV) top.db $(SYNDB)$(CLEANUP)

A:@make $(SYNDB)/A.db

$(SYNDB)/A.db: $(SCRIPT)/A.scr \$(SRC)/A.v \$(SYNDB)/A.db

$(DC_SHELL) –f $(SCRIPT)/A.scr | tee A.log$(MV) A.log $(LOG)$(MV) A.sv $(NETLIST)$(MV) A.db $(SYNDB)$(CLEANUP)

B:@make $(SYNDB)/B.db

$(SYNDB)/B.db: $(SCRIPT)/B.scr \$(SRC)/B.v \$(SYNDB)/B.db

$(DC_SHELL) –f $(SCRIPT)/B.scr | tee B.log$(MV) B.log $(LOG)$(MV) B.sv $(NETLIST)$(MV) B.db $(SYNDB)$(CLEANUP)

Page 334: Advanced ASIC Chip Synthesis - Bhatnagar

Index

.synopsys_dc.setup, 21,48

.synopsys_pt.setup, 21,48

A

all_violators, 306

allocate_budgets, 139

always, 90, 94

analyze, 23

analyze command, 57

assign, 100

assign statements, 183

asynchronous reset, 95

ATPG test patterns, 162

Attributes, 53

average_capacitance, 70

Bbalance_buffer, 149

balanced clock tree, 193

balanced_tree, 69

Behavior Compiler, 5

Behavioral, 5

best_case_tree, 69

blockages, 216

blocking, 100

bottleneck analysis, 302

boundary conditions, 189

boundary scan, 9, 155

brute force method, 213

budget_shell, 137

bus_naming_style, 181

Ccapture cycle, 157

cascaded mux, 98

case analysis, 256

case statement, 93, 97, 98

Cell, 52

cell_footprint, 205

change_names, 24, 181

check_design, 182

Page 335: Advanced ASIC Chip Synthesis - Bhatnagar

322

check_legality, 225

check_test, 162, 168, 220

Clock, 52

clock gating, 87, 254, 305

clock latency, 192

clock network delay, 295

clock skew, 172, 252, 297

Clock Tree Compiler, 17, 227

clock tree insertion, 12, 228

clock tree synthesis, 148, 191

clock_gating_hold, 306

clock_gating_setup, 306

clocked scan, 156

compile, 141

-in_place, 206

compile_disable_area_opt_during_inplace_opt,

206

compile_ignore_area_during_inplace_opt, 206

compile_ignore_footprint_during_inplace_opt,

206

compile_new_boolean_structure, 146

compile_ok_to_buffer_during_inplace_opt,

149,206,208

compile_physical, 218, 219

concatenation

lists, 248

connect_net, 196, 212

create_cell, 196, 212

create_clock, 110, 119, 250

create_generated_clock, 111, 123, 253,

269

create_placement, 225

create_port, 196

create_test_clock, 159, 161, 168

create_test_patterns, 163

create_wire_load, 204

CTS. 191

current_design, 23

custom wire-load models, 204

CWLM, 204

DDC, 46

dc_shell, 46

dc_shell-t, 46

dcsh, 139

dctcl, 139

default clause, 97

default statement, 92

default_inout_pin_cap, 66

default_input_pin_cap, 66

default_max_fanout, 66

default_max_transition, 66

default_operating_conditions, 66

default_output_pin_cap, 66

default_wire_load_mode, 71

default_wire_load_selection, 71

define_name_rules, 22, 181

delay calculation, 77, 237

delay_type, 257

derated, 203

derating, 67

Design, 51

Design Compiler, 46

Design Objects, 51

design reuse, 84

Design Rule Constraints, 108

Design Vision, 46

design_vision, 46

Design-for-Test, 153

Detailed Routing, 196

DFT, 8, 153

DFT Compiler, 47, 153

DFTC, 47

Directives, 58

disconnect_net, 196, 212

Page 336: Advanced ASIC Chip Synthesis - Bhatnagar

323

dont_touch, 25

DRC, 13, 73, 74, 108

DSPF, 283

DSPF format, 198

DV, 46

Dynamic Simulation, 5

EECO, 13

ECO compiler, 14

EDIF, 55, 179

elaborate, 23

elaborate command, 57

else, 249

else statement, 92

elsif, 249

elsf statement, 99

enumerated types, 89

environment file, 7

existing_scan, 164, 168

expr, 247

extraction, 197, 202

extrapolate, 76

Ffalse paths, 273

False paths, 271

fanin, 212

fanout_length, 70

fanout_load, 73

finite state machines, 89

fishbone, 12

flattening, 143

Flattening, 144

flip-flop, 91

floorplanning, 186

Floorplanning, 32

formal verification, 9

Formality, 9, 47

forward annotating, 31

fulLcase, 97

GG2PG, 217, 221

gated clocks, 170

gated resets, 170

gate-level simulation, 20

GDSII., 228

generated clocks, 171

Generated Clocks, 122

get_attribute, 54

get_cells, 54

get_clocks, 54

get_designs, 54

get_lib_cells, 54

get_nets, 54

get_ports, 54

Global Routing, 196

glue logic, 88

glue-logic, 84

ground straps, 217

group, 85

group_path, 118

GTECH, 57

HHDL, 4

hdlin_enable_rtldrc_info, 168

hdlin_translate_off_skip_text, 61

hierarchy, 89

HOLD timing check, 232

hold-time, 292

Hold-Time Fixes, 39

hold-time violations, 23, 27, 209, 282

Page 337: Advanced ASIC Chip Synthesis - Bhatnagar

324

Iideal clock, 295

IEEE packages, 56

IEEE PDEF 3.0 format, 217

if, 249if statement, 92,97

ifdef, 59

in place optimization, 39

in_place, 142

in_place_swap_mode, 66, 205

incremental_mapping, 142

in-place optimization, 202, 205

insert_scan, 163, 168

-physical, 219

-physical, 165

integrated G2PG, 224

integrated physopt, 226

INTERCONNECT delay, 231

interpolation, 76

IOPATH delay, 231

IPO, 39, 202, 205

JJTAG, 9, 155

KK-factors, 67

Llatch, 91, 92

latches, 170

LBO, 39, 207

lbo_buffer_insertion_enabled, 208

lbo_buffer_removal_enabled, 208

LC, 46

lc_shell, 46

LEF, 64

lef2pdb, 64, 65

legalize_placement, 225

Library, 52

Library Compiler, 46

Library Exchange Format, 64

library group, 65

library level attributes, 66

linkjibrary, 49, 50

link_path, 49, 244

Links to Layout, 178

Lists, 247

load capacitances, 75

location based optimization, 39

Location Based Optimization, 207

logic BIST, 8, 154

logic library, 64

lssd, 156

LTL, 178

lumped parasitic, 198

LVS, 13, 195

Mmakefile, 136

map_effort, 141

match_footprint, 66, 205

max_capacitance, 73

max_fanout, 73

max_transition, 54, 73

memory BIST, 8, 154, 173

mixed HDL, 5

multicycle paths, 267

multiple clock domains, 172

multiple clocks, 87

multiplexed flip-flop, 156

muxes, 97

NNet, 52

Page 338: Advanced ASIC Chip Synthesis - Bhatnagar

325

NLDM, 75

no_design_rule, 142

non-blocking, 100

non-linear delay model, 75

number_of_nets, 70

OOpen Verilog International, 198

operating_conditions, 68

Optimizing

clock networks, 148

others clause, 94

OVI, 198

Pparallel_case, 97

parasitic capacitances, 32, 37

Partitioning, 84

PATHCONSTRAINT, 187

pdb format, 64

PDEF, 190

PhyC, 46

Physical Compiler, 14, 42, 46, 165

physical library, 64

Physical Synthesis, 17

physical_library, 49

physopt, 221, 222, 223physopt_fix_multiple_port_nets, 226

physopt_pnet_complete_blockage_layer_names, 216

physopt_pnet_partial_blockage_layer_names,

217

Pin, 52

PLO,39, 178

Port, 52

post-layout optimization, 39

post-route clock, 284

power straps, 217

pragma, 61,62

pre-layout clock, 279

Pre-Layout Steps, 23

preview_scan, 161,168

primetime, 244

PrimeTime, 47

printvar, 53,216

priority encoder, 98

process, 90, 96

psyn_gui, 47

psyn_shell, 47,216

PT, 47

pt_shell, 47,244

RRAM, 173

RC delays, 34,37

RC tree model, 67

read command, 56

read_clusters, 35, 190, 203

read_db

-netlist_only, 249

read_edif, 250

readjiarasitics, 204, 236, 284

read_pdef, 165, 168, 169, 219

read_sdf, 33, 203, 235, 283

read_verilog, 250

read_vhdl, 250

Reference, 52

remove_attribute, 54, 180, 184

remove_case_analysis, 256

remove_dont_touch_placement, 226

remove_unconnected_ports, 24, 182

reoptimize_design, 36, 41

-in_place, 206

reoptimize_design_changed_list_file_name,

209

report_bottleneck, 260, 304

report_case_analysis, 257

Page 339: Advanced ASIC Chip Synthesis - Bhatnagar

326

report_congestion, 226

report_constraint, 33, 259

-clock_gating..., 305

report_disable_timing, 255

report_net, 149

report_test, 163

report_timing, 257, 258, 266

-nets -capacitance -transition_time,

297

report_transitive_fanout, 185

Route Compiler., 228

Routing, 32

RSPF, 283

RSPF format, 198

RTL, 5, 86

RTL2PG, 217, 218

rtldrc, 159, 168

run_router, 225

Ssame edge clock, 171

scaling factors, 67

scan chain ordering, 163,164

Scan chain ordering, 172

scan insertion, 155

scan_mode, 158

scan_order, 221

SDF file, 187

SDF file generation, 232

SDF format, 198

SDF generation, 41

SDF Generation, 30

search_path, 49, 244

sensitivity list, 94

sensitivity lists, 90

set bus_naming_style, 22

set hdlin_enable_rtldrc_info, 159

set link_library, 22

set physical_library, 22

set search_path, 22

set symbol_library, 22

set target_library, 22

set test_default_scan_style, 22

set verilogout_no_tri, 22

set verilogout_show_unconnected_pins,

22

set_annotated_check, 237, 241

set_annotated_delay, 234

set_attribute, 54

is_test_ready, 227

set_case_analysis, 256, 276, 281

set_clock_gating_check, 254

set_clock_latency, 115, 119, 233, 251, 279

set_clock_transition, 115, 120, 233, 251, 279

set_clock_uncertainty, 115, 120, 252, 279

set_congestion_options, 225, 227

set_disable_timing, 78, 238, 255, 276

set_dont_touch, 53, 112

set_dont_touch_network, 111, 149, 184

set_dont_touch_placment, 226

set_dont_use, 112

set_drive, 108

set_driving_cell, 108

set_false_path, 117, 272, 273

set_fix_hold, 41, 210

set_fix_multiple_port_nets, 24, 184

set_flatten, 144, 145

set_input_delay, 113, 277

set_input_transition, 255

set_load, 108

set_max_area

-ignore_tns, 132

set_max_capacitance, 109

set_max_delay, 117

set_max_fanout, 109

set_max_transition, 109

Page 340: Advanced ASIC Chip Synthesis - Bhatnagar

327

set_min_delay, 118, 270

set_min_library, 105, 210

set_multicycle_path, 117, 268

set_operating_conditions, 106, 277

set_output_delay, 114, 277

set_propagated_clock, 116, 121, 236, 251, 284

set_scan_configuration, 161, 164, 167

set_scan_element, 171

set_scan_signal, 161, 168

set_signal_type, 160, 164, 168

set_structure, 144, 146

set_test_hold, 159, 161, 168

set_timing_derate, 256

set_wire_load_mode, 107, 277

set_wire_load_model, 107, 277

setup time, 289

SETUP timing check, 232

setup-time violations, 27

shift cycle, 156

signal assignments, 101

slew rates, 75

SolvNET, 37, 284

source, 33

spare cells, 14

SPEF, 283

SPEF format, 198

spine, 12

standard_deviation, 70

static timing analysis, 10

Structural, 5

structuring, 143

Structuring, 145

swap cells, 301

swap_cell, 263, 301

switch, 249

symbol_library, 49

synchronous reset, 95

synthesis environment, 7

synthesis_off, 61

synthesis_on, 61

Ttarget_library, 49, 50

Tcl, 245

TDL, 187

temperature, 67

test bench, 5

test pattern generation, 167

test signal, 158

test_asynch, 164

test_asynch_inverted, 160

test_disable_find_best_scan_out, 166

test-ready, 160

TetraMAX, 167

timing constraints, 7

timing driven placement, 11

timing exceptions, 267timing_driven_congestion, 221

timing_range, 69

TIMINGCHECK, 187

timing-driven-layout, 187

Total Negative Slack, 131

Traditional Design Flow, 2

traditional flow, 22

tran primitives, 183

transcript, 245

translate_off, 59, 61

translate_on, 59, 61

tree_type, 69

tri wires, 183

tri-state bus, 169

tri-state logic, 99

Uungroup, 85

-flatten, 147

Page 341: Advanced ASIC Chip Synthesis - Bhatnagar

328

uniquify, 139, 179, 180

unresolved references, 185

update_lib, 36, 205

Vvariable assignments, 101

Variables, 52, 246

verbose, 306

Verilog, 4, 55

verilogout_no_tri, 183

verilogout_show_unconnected_pins, 183

verilogout_unconnected_prefix, 53

VHDL, 4, 55

voltage, 67

Wwire_load, 69

wire_load_from_area, 71

wire_load_selection, 71

wire-load models, 17, 42, 197

Worst Negative Slack, 131

worst_case_tree, 69

write_clusters, 190

write_constraints, 31, 187

write_context, 139

write_pdef, 168, 169, 220

write_script, 137

write_sdf, 29, 232, 261

write_sdf_constraints, 262

write_timing, 232

XX-generation, 236