OpenCL 2.0 Feedback Series:
- OpenCL Standardization Issues
- My OpenCL Vision and Philosophy (this article)
- OpenCL 1.3: My Proposal For a Final 1.x Release
- OpenCL 2.0: SPIR Feedback and Vision
In my last article, I criticized OpenCL for lacking a published vision or philosophy which unifies the community and provides a litmus test for features. I would like to outline my vision for OpenCL with a new low-level library model, and a potential philosophy. I have inferred my own OpenCL philosophy from the published standards and vendor documentation, however in this article I am going to put forward a different philosophy. If you disagree with me, please post an articulate comment to this article. If you write a response to my article, contact me and I will update this entry with a link back to you. My intention is to spark a real public debate on the future of OpenCL.
My Proposed OpenCL Philosophy
OpenCL is a collection of libraries that enable low-level programming of a heterogeneous collection of devices. The primary role of OpenCL is to enable software portability across disparate hardware devices and operating systems. We recognize that new devices will be created, and programming models will change. Therefore, OpenCL cannot provide a single unified interface that will stand the test of time. Instead, OpenCL is a federation of technologies that are made accessible to developers through a common and minimal API.
OpenCL is a low-level standard, and provides support for high-level libraries to provide language or domain-specific abstractions to developers. The primary role of OpenCL is to expose what devices are capable of doing, and provide access to these capabilities. OpenCL has no place in deciding how developers should use these capabilities, or what the capabilities should be. Instead, OpenCL defers to partner specifications which use the low-level interface of OpenCL to expose new functionality.
OpenCL recognizes that hardware and software technology are constantly changing, and that extra effort is required to future-proof software for the benefit of consumers.
My Proposed OpenCL Model
OpenCL is a low-level standard designed to provide a common programming interface to disparate devices connected to a host. Today, consumer applications have moved beyond the PC to mobile devices and consumer electronics, such as televisions and game consoles. Compute nodes in an HPC cluster might have access to specialized hardware to accelerate specific scientific applications. OpenCL enables developers to use multiple devices within a single application based on device capabilities. For example, an OpenCL application may use a GPU for image manipulation, a DSP for audio processing, and the CPU for low-latency data-structure operations. OpenCL enables applications to support a heterogeneous world.
The OpenCL host is simply defined as any system capable of calling a collection of C calls provided by the OpenCL host API. This loose definition enables an OpenCL host to be a PC with connected GPU devices, or a master-node on a compute cluster with other nodes acting as devices over an interconnect. The separation of host and device is critical to OpenCL, because it enables a heterogeneous environment. In particular, there is no guarantee that host and device share memory, have the same endianness, or ISA. The fundamental role of the OpenCL host API is to discover devices and issue them with commands in a portable manner.
It is not the role of OpenCL to define what a device is, or what commands it should support. This is a tumultuous time for the technology industry as it grapples with software parallelization. On the hardware side, multi-core processors have risen to prominence, along with GPUs, compute clusters and FPGAs. OpenCL future-proofs itself by providing developers with an abstract definition of a device. An OpenCL device could be a multi-core CPU, a DSP, an FPGA, or a remote system that dispatches work to a compute cluster. Each device has substantially different capabilities, and the concept of a “capability” is the key to OpenCL software portability.
Let’s suppose for the sake of argument that software developers are not particularly interested in the specific hardware of a device. OpenCL software developers select devices based on their capabilities. An image-processing application might use a GPU for image manipulation, based on its capability to perform this task. Another application might select a specialized device that generates good random numbers for cryptographic purposes. Suppose that both applications have shipped, and one day a customer purchases a new piece of hardware that can do faster image manipulation and generate better random numbers than the other devices on the host. The software should select this new device, and the customer can immediately benefit without waiting for a software update. This is what software portability should be in a heterogeneous world. OpenCL software selects devices based upon capabilities, rather than type or vendor.
“Wait a second”, you say, “software must take into account hardware-specific features for maximum performance.” I agree with you. But how do we do this now? High-performance software will run, albeit slowly, on any processor. Developers add logic to select hardware-specific execution paths, with a default portable case if the hardware is unknown. If a new device, using the default portable case, is simply faster than another device using the tuned case, then it should be selected by a performance-portable application. This enables customers to use better hardware immediately, and provides software vendors with an opportunity to sell an updated piece of software that is tuned to the new hardware.
I have mentioned the word “capability” several times, but I have not provided a sensible definition. Quite simply, a capability is something that a device can do, which is of interest to developers. A capability is a complete, documented, and hopefully standardized, software interface. You can consider a capability as identifying an API available for a device. I envision a capability as an integer that identifies a particular standard API, and ioctl-like calls are used to call the API, but I’ll get into the details in a moment. From the perspective of a programmer, devices are only interesting if they support the capabilities (i.e. APIs) required for a particular program. The diagram below illustrates the programmer’s view of devices as collections of capabilities.
There is a final missing concept in my OpenCL model: the implementation. An implementation is installed on the host, and provides access to devices via the OpenCL host API. Typically, an implementation will be shipped by hardware vendors to enable use of a particular device. It is also possible that software vendors provide implementations that do something interesting, without relying upon hardware directly. For example, an implementation might provide a virtual device that transparently uses hardware over the Internet. OpenCL is designed to ensure that multiple implementations can be present on the same host without interfering with each other.
The OpenCL host API is now straightforward. Here is a simple API that could be provided:
* All functions return 0 on success, or an error code on failure.
* The implementation_id is unique.
* The pair (implementation_id, device_id) is guaranteed to be unique and always reference the same hardware on a host.
* Get implementations available on the system.
int32_t clGetImplementations(uint64_t implementation_list_size, uint64_t* implementation_list, uint64_t* num_implementations);
* Query an implementation for its capabilities.
int32_t clGetImplementationCapabilities(uint64_t implementation_id, uint64_t capabilities_list_size, uint64_t* capabilities, uint64_t* num_capabilities);
* Use the capability of an implementation.
int32_t clCallImplementationCapability(uint64_t implementation_id, uint64_t capability, void* command, uint64_t* command_size);
* Query an implementation for the set of devices that it contains.
int32_t clGetImplementationDevices(uint64_t implementation_id, uint64_t device_id_list_size, uint64_t* device_id_list, uint64_t* num_devices);
* Query a device for its capabilities.
int32_t clGetDeviceCapabilities(uint64_t implementation_id, uint64_t device_id, uint64_t capabilities_list_size, uint64_t* capabilities, uint64_t* num_capabilities);
* Use the capability of a device.
int32_t clCallDeviceCapability(uint64_t implementation_id, uint64_t device_id, uint64_t capability, void* command, uint64_t* command_size);
There are three key components to this API:
- Implementation: this is some implementation of OpenCL, likely provided by a vendor to support some hardware. The implementation does not necessarily provide access to devices, it is possible that an implementation does something by itself. Therefore, implementations also have capabilities.
- Device: something accessible to the host, which is provided by an implementation.
- Capability: a command call supported by the device or implementation which is externally documented. Capabilities provide an ioctl-like interface to OpenCL implementations and devices. Capabilities are identified by a unique integer which is registered with the standards body (some are reserved for personal or open-source use), and has a well-defined behavior that is externally documented.
The programmer now has the following view of the world:
The OpenCL host API is very low-level, and resembles a system-call interface. It is not meant to be used directly, instead its purpose is to preserve binary compatibility in spite of constant technological change. Any high-level library that uses the OpenCL host API will continue to work, so long as implementations continue to support the required capabilities. This design also addresses one of my major concerns regarding feature deprecation in OpenCL.
In my OpenCL model and host API, a capability is any defined interface. Each capability should be externally documented and standardized. Capabilities should be fairly fine-grained to have value, however they do not have to be. Capabilities might also depend upon other capabilities. Feature deprecation now becomes capability deprecation, which means that new implementations might no longer support that capability. However, as the number of applications which are OpenCL-enabled increase, capability deprecation will become a drastic and rare event. Deprecation also does not break binary compatibility of shipped applications.
I would like to illustrate the benefit of this approach with some concrete, though fictitious, example capabilities.
- Random Number Generator: an interface for obtaining random numbers. This might be supported by a sound card, or specialized hardware attached to the host.
- Shared Memory Support: the device has access to shared memory with virtual addressing.
- Mappable Memory: the device is capable of mapping memory.
- OpenCL 1.2: the device supports OpenCL 1.2, which is accessed by the new low-level OpenCL host API. OpenCL 1.2 calls translate to the new low-level calls.
Reflections on the OpenCL Model
I believe that the model I’ve outlined, or one like it, is necessary to reduce the volatility of OpenCL support. The OpenCL 1.x standards already provide several options and extensions, and vendors provide their own extensions on top of that. For example, images are optionally supported by hardware. Not all devices will likely support shared memory as outlined in the provisional OpenCL 2.0 specification, and not all devices will support it in the same way. Not all devices can compile OpenCL C, and there are already a plethora of OpenCL versions deployed. Currently, the OpenCL standard provides the ND-range execution model to execute OpenCL C kernel functions. Although this is a nice abstraction, we cannot view it as universally useful. In time, computer science theory may discover better execution models which should be supported.
I view my proposed low-level host API, based on the capabilities of devices, as a natural evolution of OpenCL. I also view my model as a minor adjustment to OpenCL which provides a wide array of benefits without significant penalty. OpenCL can become a simpler standard. For example, basic OpenCL C code supporting the ND-range execution model can become a capability. Image support can become a new capability, which cleanly extends OpenCL C and the host API as required. A capability can be defined which has complete support for one of the OpenCL standards. Implementations that already have complete support for OpenCL will not have to change much. However, hardware with different execution models, or even just a selection of built-in functions, can support a weaker capability without implementing the entire standard.
OpenCL does support extensions, and I’d like to address why they are not sufficient. Optional OpenCL extensions have gradually become required features for conformance. Although I am not experienced in hardware design, I imagine that if this continues, eventually required features will limit the ability of a hardware vendor to support OpenCL. OpenCL extensions don’t exist in a vacuum, there are many core features of OpenCL that are present, such as the execution model, which may limit extensions that challenge these models. We must anticipate change, because hardware and software technology has several revolutions to go before we get parallelization right.
I will conclude this section by mentioning that OpenCL 1.x versions, and OpenCL 2.0 versions may be transparently mapped to this model. Capabilities may be created for those standards, and OpenCL libraries could be adjusted to call the new API without significant change.
The Role of The Standards Body
My vision for OpenCL changes the role of the standards body, and the current OpenCL standard. Rather than producing a single monolithic standard, an ecosystem of standardized capabilities are produced. In particular, each capability is a standardized standalone API that provides a specific functionality. It is critical that capabilities, and the standards that provide them, do not change once published aside from clarifications to eliminate misunderstandings. A new version of a standard should be a new capability, and if a standard is extended then naturally the previous capability is also supported by the new one.
The official OpenCL standards body has a new role in allocating capability identifiers. There will always be a need for unofficial capabilities, which are used by projects and vendors for testing new ideas. However, the standards body should provide a set of official capabilities which are stable, and which vendors have agreed to implement. Care must be taken to ensure that official capabilities do not overlap in functionality without proper justification.
One of the most important advantages of OpenCL is that it has provided a common programming model and language for so many hardware architectures, namely OpenCL C. I regard OpenCL C as a good thing, but we must acknowledge that it is an evolving language and might ultimately suffer from conceptual flaws. The capability model I propose could easily provide an OpenCL C capability, meaning that new programs can be compiled and loaded. If OpenCL C loses popularity due to the introduction of new program representations, then the new representation and OpenCL C can coexist without undue complication to user applications or developers.
I expect some controversy in my suggested vision, because the programming model I propose is quite different. I hope this blog strikes an interesting, and public, discussion on the future of OpenCL and accelerated applications. The primary role of my suggested capability model is to enable deprecations to occur within the industry, without causing binary compatibility issues or developer stress. If my suggestions are found to have some appeal, I would be happy to cooperate with members of Khronos to prepare a formal specification. I will also note that the above capability model, aside from its other advantages, would make OpenCL into a type of hardware-service discovery system leading to interesting service-oriented applications. In my opinion, that is the ultimate goal of heterogeneous application development.