The OpenCL 3.0 Finalized Specification was released on September 30th 2020
Read the Blog about the final release of OpenCL 3.0 Provisional Press Release Provisional Launch Presentation
OpenCL 3.0 realigns the OpenCL roadmap to enable developer-requested functionality to be broadly deployed by hardware vendors, and it significantly increases deployment flexibility by empowering conformant OpenCL implementations to focus on functionality relevant to their target markets. OpenCL 3.0 also integrates subgroup functionality into the core specification, ships with a new unified API and OpenCL C 3.0 language specifications and introduces extensions for asynchronous data copies to enable a new class of embedded processors.
OpenCL speeds applications by offloading their most computationally intensive code onto accelerator processors - or devices. OpenCL developers use C or C++-based kernel languages to code programs that are passed through a device compiler for parallel execution on accelerator devices.
OpenCL provides the industry with the lowest 'close-to-metal' processor-agile execution layer for accelerating applications, libraries and engines, and also providing a code generation target for compilers. Unlike 'GPU-only' APIs, such as Vulkan, OpenCL enables use of a diverse range of accelerators including multi-core CPUs, GPUs, DSPs, FPGAs and dedicated hardware such as inferencing engines.
As the industry landscape of platforms and devices grows more complex, tools are evolving the enable OpenCL applications to be deployed onto platforms that do not have available native OpenCL drivers. For example, the open source clspv compiler and clvk API translator enable OpenCL applications to be run over a Vulkan run-time. This gives OpenCL developers significant flexibility on where and how they can deploy their OpenCL applications.
An OpenCL application is split into host and device parts with host code written using a general programming language such as C or C++ and compiled by a conventional compiler for execution on a host CPU.
The device compilation phase can be done online, i.e. during execution of an application using special API calls. It can alternatively be compiled before executing the application into the machine binary or special portable intermediate representation defined by Khronos called SPIR-V. There are also domain specific languages and frameworks that can compile to OpenCL either using source-to-source translations or generating binary/SPIR-V, for example Halide.
Application host code is frequently written in C or C++ but bindings for other languages are also available, such as Python. Kernel programs can be written in a dialect of C (OpenCL C) or C++ (C++ for OpenCL) that enables a developer to program computationally intensive parts of their application in a kernel program. All versions of the OpenCL C language are based on C99. The community driven C++ for OpenCL language brings together capabilities of OpenCL and C++17.
The OpenCL working group has transitioned from the original OpenCL C++ kernel language first defined in OpenCL 2.0 to C++ for OpenCL developed by the open source community to provide improved features and compatibility with OpenCL C. C++ for OpenCL is supported by Clang and its documentation can be found here. It enables developers to use most C++17 features in OpenCL kernels. It is largely backwards compatible with OpenCL C 2.0 enabling it to be used to program accelerators with OpenCL 2.0 or above with conformant drivers that support SPIR-V. Its implementation in Clang can be tracked via the OpenCL Support Page.
Some extensions are available to the existing published kernel language standards. The full list of such extensions is documented here . Conformant compilers and drivers may optionally support the extensions and so there is a mechanism to detect their support at the compile time. Developers should be aware that not all extensions may be supported across all devices.
Here you can view a list of hardware vendors with Conformant OpenCL Implementations