Vulkan SC: Overview - and how it is different from the Vulkan you already know

March 1, 2022 by Daniel Koch, NVIDIA, with contributions from the Vulkan SC Working Group vulkansc

Why Safety Critical?

Demand for advanced GPU-accelerated graphics and compute is growing in a wide range of industries where safety is paramount, such as automotive and avionics. When a compute or display system failure would pose a significant safety risk it is vital that systems meet safety-critical standards such as ISO 26262.

Vulkan SC is a low-level, deterministic API that enables safety-critical system implementers to deploy state-of-the-art GPU graphics and compute acceleration by streamlining the system-level safety certification process. Vulkan SC can also be invaluable for real-time embedded applications, even if not formally safety certified.

Vulkan SC 1.0 is evolved from Vulkan 1.2 and includes the removal of runtime functionality that is not needed in safety-critical markets, an updated design to provide predictable execution times and results, and clarifications to remove potential ambiguity in its operation. The definitive list of changes and added functionality is documented in the Vulkan SC 1.0 Specification Appendix H: Vulkan SC Deviations from Base Vulkan.

This article provides a summary of the updates that have been made to create Vulkan SC 1.0, and explains how these new capabilities are typically used in applications. It is assumed that the reader has a reasonable understanding of the Vulkan API.

Safety Critical Valid Usage

The Vulkan SC API follows the Vulkan philosophy of requiring very limited error checking in production. As with Vulkan, the Vulkan SC specification contains extensive Valid Usage statements which define the set of conditions which must be met in order to achieve well-defined runtime behavior in an application. However, this does not mean that a safety-certified system will have no error checking, merely that it is not mandated to be in the driver by the specification.

Safety certification is performed on a complete system, including the application, the Vulkan SC implementation, other system libraries, and even the hardware. It is up to each vendor and system integrator to determine which component(s) are responsible for input validation and error checking. The application could be responsible for input validation and error checking, the implementation could do validation to reject inputs that would result in incorrect behavior, or formal methods could be used to prove that no possible inputs will result in undefined execution. Responsibilities and requirements should be clearly documented in the appropriate vendor-specific safety documentation.

It is expected that validation layers will be supported in development environments but they are unlikely to be supported on deployment systems. Khronos has no plans to provide a safety certified loader or validation layers.

How Does Vulkan SC 1.0 Differ from Vulkan 1.2?

Instance Creation

In order to distinguish between Vulkan and Vulkan SC API versions, an API variant was retroactively added to the packed Vulkan version number in Vulkan 1.2.175 by partitioning the upper 3 bits off of the major version field. Vulkan uses a variant of 0, and is thus backwards compatible with the previous version partitioning. Vulkan SC uses a variant version of 1. This affects the pApiVersion that is returned from vkEnumerateInstanceVersion and the VkApplicationInfo::apiVersion that is requested via vkCreateInstance. A version of VKSC_API_VERSION_1_0 is used to specify a Vulkan SC 1.0 application, and VK_API_VERSION_1_2 is used to specify a Vulkan 1.2 application.

There is also a VK_API_VERSION_VARIANT macro that can be used to determine whether an instance supports Vulkan SC (VKSC_API_VARIANT == 1) or Vulkan (variant 0). Note that it is not expected that a single implementation will support both Vulkan and Vulkan SC – this is intended to enable early detection of a driver - application mismatch.

Features and Properties

In general, Vulkan SC 1.0 requires all the same features as Vulkan 1.2. Notable exceptions to this are shader atomic instructions, multiview, timeline semaphores, and depth-stencil resolve, all of which are made optional, and the Vulkan memory model which is made mandatory. The removals were made in order to reduce driver complexity and verification burden. The memory model was required in order to provide well-defined memory semantics and give the ability to reason about the correctness of memory operations.

Vulkan SC-specific device features and properties can be determined by passing the VkPhysicalDeviceVulkanSC10Features structure to vkGetPhysicalDeviceFeatures2, and the VkPhysicalDeviceVulkanSC10Properties structure to vkGetPhysicalDeviceProperties2, respectively.

The VkPhysicalDeviceVulkanSC10Properties structure introduces a number of Vulkan SC-specific properties and limitations which may be more stringent than Vulkan. Some types of memory recycling and freeing operations are optional, as are certain command pool and command buffer operations. Vulkan SC also adds a number of potentially more restrictive limits on render passes, subpasses, framebuffers, and command buffers in order to make static memory reservations more feasible and bounded.

Vulkan SC also adds a new memory heap property (VK_MEMORY_HEAP_SEU_SAFE_BIT) which enables an implementation to denote memory heaps which are robust against single event upsets (more on this later).

Device Creation

Vulkan SC applications must provide some mandatory structures and information to device creation at start-up time. These include the VkDeviceObjectReservationCreateInfo structure which provides the offline compiled pipelines and maximum object counts for static memory reservations (more on these below), and the VkPhysicalDeviceVulkanSC10Features structure which determines which Vulkan SC-specific features are enabled. If these structures are not included, vkCreateDevice will fail with VK_ERROR_INITIALIZATION_FAILED. If the provided pipeline cache is not compatible with the device, VK_ERROR_INVALID_PIPELINE_CACHE_DATA is returned. If more logical devices or other objects are requested than can be allocated simultaneously, VK_ERROR_TOO_MANY_OBJECTS is returned. Some platforms may have limitations on the number of devices that can be created per-instance. If the application wishes to register the fault handling callback, this is done by including the VkFaultCallbackInfo structure.

Offline Compiled Pipelines

Vulkan SC does not support online pipeline compilation (Section 10.7), and thus all pipelines must be compiled offline using an implementation-specific pipeline cache compiler (PCC). The PCC takes as input a JSON file describing the pipeline and all related states, along with the corresponding SPIR-V modules for that pipeline. These are compiled into final machine representation and stored in a pipeline cache entry for the given pipeline. All of the pipeline cache entries are combined into one (or more) pipeline caches which are provided as part of device creation in the VkDeviceObjectReservationCreateInfo structure (in the pipelineCacheCreateInfoCount and pPipelineCacheCreateInfos members) and loaded into VkPipelineCache object(s) by the application. The pipeline cache information provided to device creation is not loaded at that point, but the implementation may scan all pipeline cache entries in order to determine the worst-case sizes for driver-internal structures related to pipelines, as discussed below.

A pipeline cache consists of a documented header (VkPipelineCacheHeaderVersionSafetyCriticalOne) and pipeline cache index (VkPipelineCacheSafetyCriticalIndexEntry), optional debug information, and the implementation-specific pipeline binary storage. The pipeline cache index includes the pipeline identifier (pipelineIdentifier) and the amount of pipeline memory required for this pipeline’s binary (pipelineMemorySize). The pipeline identifier for a given pipeline can be provided as input to the PCC via the JSON file, or it can be assigned by the PCC. If debug information is enabled for the pipeline cache, the JSON file for the pipeline state and the SPIR-V modules for each pipeline stage are also included in the pipeline cache – these are used for validation or debugging purposes, and are not used by the driver. Including debug information can significantly increase the size of the pipeline cache. Khronos has provided a Pipeline Cache Utility to enable applications and tools to easily extract the standardized information from the pipeline cache.

Pipeline Pools

Memory for pipelines is reserved by the implementation at device creation time. The application specifies a number of pools of fixed size entries (see VkPipelinePoolSize) in the VkDeviceObjectReservationCreateInfo structure (pipelinePoolSizeCount and pPipelinePoolSizes).

At runtime, the application can create pipelines as needed and the memory is distributed from a fixed size entry in the appropriate pipeline pool, based on the application’s provided pool assignment (poolEntrySize). When the pipeline is destroyed, the pipeline pool entry becomes available for subsequent re-use. The memory requirements for pipelines can vary significantly depending on the shader complexity and this bucketing of similarly sized pipelines into pools enables the application to minimize the amount of excess memory allocated for pipelines without resulting in fragmentation as pipelines are created and destroyed at runtime.

It is expected that the developer will do some offline processing of the pipeline cache in order to determine an appropriate set of pool sizes for their pipelines. The pcinfo tool (part of the Pipeline Cache Utility) has command line options to list the pipeline memory requirements and to suggest assignment of pipelines to buckets of requested sizes.

Pipeline Creation

As described above, pipelines are only compiled offline in Vulkan SC, and this necessitates some changes to pipeline creation. VkShaderModule objects are not used in Vulkan SC. Instead, pipelines are loaded solely from the pipeline caches that were created by the offline PCC and loaded into VkPipelineCache objects at runtime. Pipeline caches in Vulkan SC always need to be created with the VK_PIPELINE_CACHE_CREATE_READ_ONLY_BIT and VK_PIPELINE_CACHE_CREATE_USE_APPLICATION_STORAGE_BIT flags set, to indicate that they will not be modified at runtime, and the application will maintain the contents of the memory pointed to by pInitialData for the life of the pipeline cache object so that the driver does not need to make a copy of the data.

When a pipeline is created, a VkPipelineOfflineCreateInfo structure must be provided with each Vk*PipelineCreateInfo structure. This structure provides the pipeline identifier (pipelineIdentifier) that is used by the implementation to locate the appropriate pipeline cache entry in the pipeline cache, and specifies which pipeline memory pool the pipeline memory should be loaded into (using poolEntrySize). The VkPipelineCacheSafetyCriticalIndexEntry::pipelineMemorySize parameter in the pipeline cache gives the minimum pool size required for each pipeline, and the VkPipelineOfflineCreateInfo::poolEntrySize specifies the bucket size of the pre-allocated pipeline pool (as requested at device creation time) that the pipeline will use.

If the VkPipelineOfflineCreateInfo structure is not provided, or if the identified pipeline is not present in the provided pipeline cache, pipeline creation will fail with the VK_ERROR_NO_PIPELINE_MATCH result.

All pipeline state must be provided at pipeline creation time and it must match the state that was provided to the PCC via the JSON file for this pipeline. The only exceptions are the module, pName, and pSpecializationInfo members of the VkPipelineShaderStageCreateInfo structure, which are irrelevant at runtime.

Maximum Object Counts

In Vulkan SC, data structures for objects are reserved by the implementation at device creation time in order to enable implementations to rely solely on static memory management at run-time. The VkDeviceObjectReservationCreateInfo structure provides upper bounds on the simultaneous number of objects of each type that can be allocated during the lifetime of the VkDevice. The application must provide upper bounds on the number of objects of all types that will exist at any point in time during the application’s lifetime, as well as upper limits for certain other object properties. The purpose of this information is to enable implementations to pre-allocate host-side memory structures for the maximum number and size of each object (for example using a bitmap allocator), if needed. The *RequestCount members of VkDeviceObjectReservationCreateInfo structure in effect set a “high-water mark” for each type of object. The max* members of the VkDeviceObjectReservationCreateInfo structure further constrain maximum properties of certain object types (e.g. array layers or mip levels of an image, number of queries in a pool, etc.). During the application’s lifetime, objects then can be created and destroyed as needed up to these limits and the driver does not need to do any runtime memory allocations. It is considered invalid usage if these limits are exceeded, resulting in undefined behavior, but implementations can choose to return an error.

Object Lifetimes

Most objects can be created and destroyed as needed, provided that no more than the requested number are in existence at any point in time (as requested by the VkDeviceObjectReservationCreateInfo structure(s)). Device memory allocations (VkDeviceMemory), swapchains (VkSwapchainKHR), and pool objects (VkCommandPool, VkDescriptorPool, VkQueryPool) cannot be explicitly freed or destroyed. If the VkPhysicalDeviceVulkanSC10Properties::deviceDestroyFreesMemory property is supported, the memory from these objects and any storage reserved by the implementation in response to VkDeviceObjectReservationCreateInfo structures is returned to the system when the device is destroyed, otherwise it might not be returned to the system until the process is terminated.

Command Pools and Buffers

When a command pool is created a Vulkan SC application must include a VkCommandPoolMemoryReservationCreateInfo structure. This determines how much memory (commandPoolReservedSize) is allocated at command pool creation time that will be used for all command buffers recorded from this pool. When a command pool is created, the number of command buffers reserved (commandPoolMaxCommandBuffers) is permanently counted against the the total number of command buffers requested via VkDeviceObjectReservationCreateInfo::commandBufferRequestCount, even if the command buffers are freed at a later time.

Command buffers can be allocated and freed in a pool as needed, up to the VkCommandPoolMemoryReservationCreateInfo::commandPoolMaxCommandBuffers limit specified when the command pool was created. Once command buffers are freed, they can once again be allocated from the command pool. However, when a command buffer is freed, the memory used by the command buffer is not returned back to the parent command pool until vkResetCommandPool is called. When a command pool is reset, the resources from all command buffers allocated from the command pool are returned back to the command pool, and are once again available for use in command buffer recording.

Each command recorded into a command buffer has an implementation-dependent size that counts against commandPoolReservedSize. Applications are expected to estimate their worst-case command buffer memory usage at development time using vkGetCommandPoolMemoryConsumption and reserve large enough command buffers. This command can also be used at runtime to verify expected memory usage. While the memory consumption of a particular command is implementation-dependent, it is a deterministic function of the parameters to the command and of the objects used by the command (including the command buffer itself). That is to say, repeating the same set of commands from the same initial state will result in deterministic command buffer memory consumption.

There are a number of Vulkan SC specific properties reported in VkPhysicalDeviceVulkanSC10Properties which indicate whether certain command buffer features are available. Recording multiple command buffers from the same command pool is only supported if the commandPoolMultipleCommandBuffersRecording property is supported. Resetting individual command buffers (with vkResetCommandBuffer) is only supported if the commandPoolResetCommandBuffer property is supported by the implementation. Finally, commandBufferSimultaneousUse indicates whether VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT is supported.

Command pools cannot be destroyed or trimmed in Vulkan SC and the VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT flag is not supported.

Fault Handling

Handling of faults is particularly important in safety-critical devices and Vulkan SC includes enhanced functionality that enables the communication of fault events between the Vulkan SC implementation and the application. This includes support for registering an application function the implementation can call if a fault is detected, via VkFaultCallbackInfo at device creation, and a function that allows the application to query the currently registered faults, vkGetFaultData, on demand. Two new device properties, maxQueryFaultCount and maxCallbackFaultCount have been added to specify the maximum number faults that can be reported via the implementation at a time.

Also provided is the VkFaultData structure that is used to capture the data on a single fault, this includes enums to categorize the fault type and level, also the pNext member of VkFaultData can be used to provide implementation-specific data on a fault via an implementation-defined structure. Vendors are encouraged to extend the fault handler with implementation-specific details, and this should be documented in the vendor’s safety documentation.

Additionally, we have added a VK_ERROR_VALIDATION_FAILED VkResult that can be returned from any command with a return value to report any validation errors detected. Additionally, and for commands that do not have return values, there is a fault type of VK_FAULT_TYPE_INVALID_API_USAGE that can be reported with a fault record.

Object Refresh

Many safety-critical environments are required to contend with single event upsets (SEUs). These occur when a bit in a physical device’s memory or register is inadvertently flipped. It is typical for host memory to include automatic error detection (EDC) or correction (ECC) on platforms where this is a concern, but device-accessible memory, for example device-local memory on discrete GPUs, may not have these protections. In that case data stored in non-SEU-safe memory must be periodically reloaded or regenerated.

Standard Vulkan already gives explicit control over most device memory data. For example, applications can reload VkBuffer data from a safe location, or re-generate a VkImage, using standard Vulkan APIs. However, on some implementations certain objects such as VkPipelines may implicitly store information in device memory without an explicit VkDeviceMemory allocation, and Vulkan does not provide any method to refresh these objects short of completely destroying and recreating them. Vulkan SC adds an extension, VK_KHR_object_refresh, to handle refreshing of these implicit device memory allocations.

Applications can query the implementation using vkGetPhysicalDeviceRefreshableObjectTypesKHR to discover which Vulkan object types have implicit device memory allocations and can be explicitly refreshed with this extension. vkCmdRefreshObjectsKHR is then used to refresh the object’s device memory data from a safe copy stored in SEU-safe memory. For the purposes of synchronization, vkCmdRefreshObjectsKHR is considered a transfer write operation.

Vulkan Functionality Removed from Vulkan SC 1.0

In keeping with the safety-critical philosophy of reducing code complexity wherever possible, there are a number of Vulkan features and commands that have not been included in Vulkan SC 1.0. In general, any extension that was promoted to core in Vulkan 1.1 or 1.2 is only supported in the core formulation in order to avoid duplicate functionality. For non-promoted extensions, only a subset of Vulkan extensions have been included in Vulkan SC 1.0. Any deprecated aliases are not included as there are no code compatibility concerns.

As the pipeline compilation process is handled completely offline in Vulkan SC, there is no need for shader modules and related functionality (vkCreateShaderModule, etc.). Similarly there is no need for pipeline derivatives and pipeline cache utility functions (vkMergePipelineCaches, vkGetPipelineCacheData).

Memory handling has been made more deterministic with the removal of sparse resources and sparse memory binding support, elimination of application allocation callbacks, and removal of memory freeing (vkFreeMemory) and pooled object destructors (vkDestroy*Pool). The expectation is that an application will create all memory allocations and pools at initialization time and then (re-)use these through the application’s lifetime in a deterministic manner in order to avoid memory fragmentation. Individual object creation and destruction is supported through the use of the static memory handling via the object reservation API.

Command pools have been simplified by the removal of the trim functionality (vkTrimCommandPool) and the ability to release resources on reset (VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT). Some other aspects of command buffers and command pools have also been made optional (multiple simultaneous recording, simultaneous use, etc). Resource descriptors have been simplified with the removal of descriptor update templates.

Wrapping Up and Looking Forward

The Vulkan SC 1.0 API successfully evolves the proven, explicit level of control in Vulkan 1.2, and streamlines it further to reduce safety certification documentation and testing surface area. Vulkan SC enables state-of-the-art GPU-accelerated graphics and computation to be deployed in safety-critical systems that are certified to meet industry functional safety standards.

Vulkan SC 1.0 introduces offline pipeline compilation and static device object memory handling to provide deterministic behavior and predictable execution times, together with new fault handling and object refresh mechanisms for robust run-time operation. Finally, Vulkan SC 1.0 eliminates, or makes optional aspects of the Vulkan 1.2 API which would be challenging to certify or which contribute significantly to driver complexity.

The Vulkan® SC 1.0 API Specification is now publicly released, and the Vulkan SC Conformance Test Suite is also freely available in open source, and multiple vendors, including CoreAVI and NVIDIA, already have officially-conformant Vulkan SC 1.0 implementations. Industry feedback on the specification is welcome at the Vulkan SC specification GitHub repository.

Stay tuned for future developments, including release of additional ecosystem tooling, and further blog installments which will discuss ways to author Vulkan SC applications.