OpenCL 2.0: SPIR Feedback and Vision

OpenCL 2.0 Feedback Series:

OpenCL Standardization Issues
My OpenCL Vision and Philosophy
OpenCL 1.3: My Proposal For a Final 1.x Release
OpenCL 2.0: SPIR Feedback and Vision (this article)

The SPIR 1.2 provisional specification is critical to the success of OpenCL because it provides freedom to the software community to explore new device languages. Personally, I have my own ideas for new device languages and abstractions but I have been unable to explore them due to inflexibility within the OpenCL standard. Although SPIR 1.2 is close to what I want, it isn’t quite good enough. In this article I am going to outline what I want SPIR to be, so that I can stop complaining and start developing software.

What is Wrong with OpenCL?

OpenCL provides some motivation for its existence within the standard:

“OpenCL supports a wide range of applications, ranging from embedded and consumer software to HPC solutions, through a low-level, high-performance, portable abstraction. By creating an efficient, close-to-the-metal programming interface, OpenCL will form the foundation layer of a parallel computing ecosystem of platform-independent tools, middleware and applications.” [1]

Tim Mattson (Principal Engineer, Intel Corp.) also provides a quote which outlines the motivation for OpenCL.

“OpenCL, however, is an unusually complex parallel programming standard. It has to be. I am aware of no other parallel programming model that addresses such a wide array of systems: GPUs, CPUs, FPGAs, embedded processors, and combinations of these systems. OpenCL is also complicated by the goals of its creators. You see, in creating OpenCL, we decided the best way to impact the industry would be to create a programming model for the performance-oriented programmer wanting full access to the details of the system. Our reasoning was that, over time, high-level models would be created to map onto OpenCL. By creating a common low-level target for these higher level models, we’d enable a rich marketplace of ideas and programmers would win. OpenCL, therefore, doesn’t give you many abstractions to make your programming job easier. You have to do all that work yourself.” [2]

I completely agree with these goals and objectives, but OpenCL is certainly not achieving them today. The problem is that OpenCL has not really provided a low-level standard, instead it is bundled with abstractions and machine models that are continuously tweaked. This is happening because OpenCL has not defined a clear software-hardware interface that separates hardware and software developer concerns. OpenCL has become incredibly complicated, with full and embedded profiles, custom devices, function deprecations, and continuous adjustments to the standard manifested in new versions. The fundamental problem is that OpenCL is attempting to do everything.

If OpenCL is to succeed, it is going to have to reduce its scope and allow a software ecosystem to develop. Don’t get me wrong, I really love OpenCL, but I am frustrated by my inability to experiment with new abstractions and models. OpenCL does not expose close-to-the-metal features that are not easily mapped to the philosophy of OpenCL C such as 24-bit integers, or new memory features like the AMD global data share. Instead, OpenCL provides an abstraction for programmers that hardware vendors map to their unique devices. This is a great approach if we knew that our abstractions were correct and sufficiently general but we don’t!

Is there any abstraction in OpenCL that I actually like? The execution model, buffers or the device model? No, and it’s not because I’m pessimistic. I do not believe that OpenCL should provide software abstractions directly, instead OpenCL should be a thin layer above the physical hardware as outlined in the quotes above. OpenCL should expose low-level interfaces to the hardware, and abstractions such as the current OpenCL 1.2 specification should be built on top of those. OpenCL should not include a compiler, or even a view of memory. Instead, OpenCL should provide the bare interfaces required to query and control hardware, leaving the rest to software tools. What is unique about OpenCL is the agreement of hardware vendors to build a portable software interface to their devices. The abstractions and programming models should be left to software developers.

What Does SPIR Fix?

SPIR provides a low-level representation of a program to run on a device. I have been careful to not mention kernels or OpenCL C in that last sentence for a good reason. SPIR has enormous potential to provide me with a low-level target for my own compiler and tools, so that I can stop complaining about OpenCL and start programming. Theoretically, if I can generate SPIR instructions that represent a program and load them onto a device, then I’m ready to start working on new tools and technology. But the problem is that SPIR still has too many dependencies on OpenCL C, and the memory and execution models of OpenCL. So, SPIR is still too far away from the hardware. I have to take my new abstractions and translate them to SPIR using OpenCL abstractions, before I can directly reach the hardware. Unique hardware features such as the global data store are not going to be visible to me in SPIR due to its dependency on OpenCL C.

The problem is that SPIR and OpenCL C cannot exist in a vacuum. You cannot write software without a concept of memory, and without some form of execution. OpenCL C and SPIR are representations of programs that have been built based on execution and memory models within the OpenCL standard. These models are abstractions that have already changed over time despite the fact that the underlying hardware has remained the same. The goal of SPIR should be expose the unchanging low-level capabilities of the physical hardware, and to allow high-level abstractions to decide which functionality to use. In particular, OpenCL C should be compiled to this low-level version of SPIR based on the capabilities of the device and structure of the C program. We have to recognize that there is not a useful and common subset of hardware features in heterogeneous computing. The OpenCL standard already provides the inelegant embedded profile and custom device to address differences in hardware features. We have to embrace a flexible model in which hardware exports services that describe what it can do, and leaves the rest to software. I have made an argument for something like this in this article.

I would be happy if SPIR provided a low-level representation of a program to run on a device, where the execution and memory models could be dependent upon the particular device. Although I believe in the power of abstraction to unify disparate devices, I do not believe that SPIR is the place for this to occur. Abstractions should be provided at a higher level. By default, SPIR should be capable of doing nothing, and a mechanism should be provided to query the fine-grained capabilities of the device. I view the memory and execution models as capabilities, along with support for things such as 24-bit integers, global data stores, and half-precision types. If this is done, then SPIR is so low-level that it appears almost useless and you might wonder if I would be happier to abandon OpenCL and directly use the device ISA. You certainly should be asking me why I think such an approach has any elegance whatsoever!

The elegance in this approach is that there is a clearly defined software-hardware interface. The hardware vendor has to provide me with SPIR, and expose the capabilities of the device in a library. That’s it. My job is to figure out how to map high-level languages and abstractions to your device based on the properties that make it unique. This is not the ISA directly, because we have these fine-grained capabilities that hardware vendors still agree upon for particular feature subsets. For instance, if your device supports 32-bit atomic operations as a capability, all vendors have agreed upon a standard representation of that feature within SPIR. In a more complicated situation, hardware transactional memory can become a feature that is exposed via SPIR without major changes to the entire OpenCL memory model! If SPIR was this flexible, then hardware vendors and software vendors have clearly defined roles and applications are actually more portable.

How is it that applications can be more portable if SPIR was this flexible? Suppose that SPIR programs are written in such a way that I can easily figure out where each capability is used. For the sake of argument, suppose that we are five years in the future and we have an application in SPIR based on OpenCL 1.2. Hardware has changed in the future so that the previous notion of global, local, and private memory regions are now obsolete. We have a new memory abstraction, and a shiny new piece of hardware but we need to run this old OpenCL 1.2 program. What do we do? We rely upon the software ecosystem to produce a tool that can translate SPIR code based on the old memory model to the new one. The translated legacy program might not use hardware features optimally, but it runs on new hardware which might be faster anyway. The software that translated memory models was only possible because the OpenCL standard anticipated that its own programs will eventually become obsolete due to the introduction of new hardware features and software abstractions.

Flexible SPIR Permits Hardware Secrecy

It has been brought to my attention that hardware vendors love secrecy, which might explain part of the apparent dysfunction of the standardization process. Hardware vendors need to expose software support for their new product lines, but they want to keep those upcoming products secret until they are announced. If vendors prematurely release software support they provide competitors with an opportunity to understand their new product line. I am not going to ask that hardware vendors change their culture, but by understanding this need we can design around it. If hardware vendors agreed upon fine-grained capabilities in secret within Khronos, and did not release the specification for those capabilities until the last minute, everybody is happy. Software developers simply see a new capability which they can use or ignore, and hardware vendors are not forcing software to be refactored to work with a new OpenCL version. In fact, the capability model I proposed along with SPIR provide an opportunity for OpenCL versions to be eliminated altogether!

Flexible SPIR for Innovative Hardware

Suppose that you are a hardware manufacturer that has just developed a new device, and you want OpenCL applications to be able to use it. What do you do? Today, you have to implement OpenCL C and the host API. There are open source implementations, but you still have quite a bit of work to do. If a flexible version of SPIR was provided which adopted my proposal for capabilities, then all you have to do is provide implementations of the fine-grained capabilities that you actually support, and something to load SPIR code onto your device. Everything else is an abstraction provided by high-level libraries and middleware. Your job as a hardware vendor is to expose access to your hardware in a standard manner. By enumerating the capabilities of your device, my application can immediately start to use your device without any adjustment to software. The role of middleware is to assemble these low-level device capabilities into high-level concepts that are exposed in a useful way to applications.

OpenCL Extensions

SPIR 1.2 provides support for extensions, and you might wonder if this is sufficient. It isn’t. The problem with extensions is that they do not allow replacement of core things like the memory or execution models. The best that OpenCL can do with an extension mechanism is generate something like the lava flow anti-pattern. Capabilities are designed to be cleanly separated, though it might help to consider my proposal as shifting all functionality into extensions, and the core standard is empty.

The Image Problem

In my opinion, direct support for images in OpenCL is a mistake. OpenCL should be a very low-level layer that defers most functionality to high-level libraries, and that includes image support. Why are images so special that they should be a core consideration of the standard? Why can’t images be implemented as a library on top of buffers? What does this have to do with SPIR? Images are a very interesting mistake because they demonstrate something fundamental about heterogeneous computing!

You cannot write image support as a library (so far) because images may use special hardware on a GPU. OpenCL C must know that the user is performing image operations to compile efficient code for the device. The opaque image types and image functions communicate user intent to the compiler so that it can generate efficient device code for image manipulation. As an intermediate between OpenCL C and the device, SPIR must support images so that the device can generate efficient code. Essentially, images represent a high-level operation that can be efficiently implemented at a low-level, but only if the high-level intent is communicated all the way down to the lowest level of the software stack. So images are not special per se, but they are an example of something general that is missing from OpenCL.

Suppose that my suggestion for a capability model is accepted and that SPIR is extended with a sparse matrix opaque type along with a fixed set of associated functions. High-level languages can expose the sparse matrix capability to developers, and generate SPIR code that expresses program logic using opaque sparse matrix types and accessor functions. There are now two situations, either a device is available with direct support for sparse matrices or it isn’t. If a device supports sparse matrices, it may have an efficient mapping from SPIR to its ISA, or it might provide direct hardware support. Alternatively, if no device has direct support for sparse matrices from the hardware vendor, I can write middleware that will examine the capabilities of a device and translate the incoming SPIR with sparse matrices into SPIR for that specific device. Hardware vendors can develop new devices that accelerate popular capabilities and existing programs can directly benefit when the new device is released.

Whether or not you agree with my solution, SPIR must acknowledge the image problem.

The OpenCL Software-Hardware Interface: SPIR

If my proposals are adopted then the high-level interface of OpenCL will change to match the following diagram:

Applications will be built on top of libraries that use device capabilities to provide new abstractions. OpenCL becomes a low-level standard that defers most functionality to high-level software, which aligns with the stated vision for the specification. The capability interface permits hardware and software to change independently, which is absolutely critical due to the state of industry flux introduced by parallel programming and heterogeneous computing. The capability model also provides a stable binary interface akin to a kernel system call.

SPIR must be decoupled from OpenCL C so that new languages can be developed. At this time, the SPIR specification references functions provided by OpenCL C and in my opinion this introduces a bad dependency. Instead, an OpenCL C compiler should be implemented using SPIR as a target, and rely upon opaque types and capabilities for special functionality. SPIR should be very simple, and OpenCL C built-in functions should be implemented through capabilities or SPIR code. I view the reference of built-in functions within the SPIR specification as a conceptual flaw within OpenCL.

SPIR is also important to permit the linking of multiple source files written in different programming languages. SPIR should be the ISA of a conceptual machine which by default does nothing, until capabilities are added one-by-one that describe what this machine can actually do. The following diagram demonstrates the ideal situation for OpenCL applications:

There is another critical point to note, just in case you haven’t notice it yet. If SPIR is decoupled from OpenCL C, then hardware vendors can concentrate exclusively on SPIR implementations and leave OpenCL C to be developed by the open source community. In particular, there is no reason why hardware vendors could not cooperatively develop an open source OpenCL C compiler. This will benefit the entire community because it will provide a very stable toolchain for developers, while freeing resources for hardware vendors to concentrate on useful optimizations within SPIR.

Is OpenCL Still Useful?

You might wonder whether or not I have “missed the point” of OpenCL. The OpenCL specification provides OpenCL C, memory and execution models, and abstractions so that I don’t have to care about the underlying hardware. I propose that OpenCL and SPIR should become something more general, and potentially more complicated for developers to actually write applications. Isn’t this a bad thing?

I understand the desire for OpenCL to be useful “out of the box”. However, OpenCL has stated that one of its objectives is to enable an ecosystem of portable middleware and software, and in my opinion OpenCL has failed at this objective. I am itching to do some really neat things with OpenCL, but I am shackled by the specification to use abstractions that I actually don’t find useful. I really want the OpenCL specification to provide an escape hatch for developers like myself who want to build cool new technology. My goal is to start a software company that uses OpenCL as a low-level platform that exposes hardware hardware functionality, and permits me to develop my own innovations for parallel programming.

I also see that my suggestions require relatively minor adjustment to the standard now, and can provide enormous benefits by providing a clear software-hardware interface. Right now, the OpenCL standard is interfering with my ability to write amazing software due to the unclear boundaries between software and hardware concerns. OpenCL has really demonstrated its utility and unified hardware vendors, but I am asking for more flexibility so that I can make inroads into the software marketplace. People will buy software products from me that really unleash OpenCL.

What About Fragmentation?

So let’s say all of my wishes come true, and I get my flexible version of SPIR and a capability model. What happens if everybody starts to develop abstractions on top of OpenCL, and the software marketplace becomes fragmented with incompatible abstractions? Good! This means that OpenCL has lived up to its promise, and the software community has an increased probability of finding the abstractions that really work. This in turn benefits the hardware vendors, because OpenCL is helping you sell devices. I assure you that software developers are quite familiar with the situation in which there are many ways to solve a problem. Although we love to fight over which abstraction we should use in which situation, we are actually happiest when we have a diverse set of abstractions from which to choose. Different projects will require different abstractions, but we can handle that as developers. The role of the hardware vendor is to allow the software community to create these abstractions and debate them, and the best mechanism for that is adjustment of OpenCL.

Why Don’t I Just Do This Myself?

If this is such a great idea, why don’t I just go out there and develop my own abstractions on top of OpenCL? I wish that I could, but there are a couple problems. First, OpenCL is unique in that it combines a host API with a programming language. If OpenCL was only a host API, I probably could develop a decent capability interface. The problem is that I can’t really touch OpenCL C unless I write my own compiler, and this is a major undertaking. It is also a waste of my time, because rather than working on abstractions and delivering software to customers, I am expending a considerable about of time “fixing” the OpenCL standard without support from the standards body. Second, I have no idea how OpenCL is going to change. I would have to partition the OpenCL standards into capabilities myself, and there is no guarantee that my partitions will align with future plans for OpenCL. So, even if I develop my own OpenCL compiler and host API with capability support, changes to the standard might mess up how I divided capabilities. The only option I have is to make capabilities so fine-grained as to be useless.

I do strongly believe in the need for a capability interface, and so I am starting to partition the OpenCL standards into capabilities myself. However, without support from Khronos I will waste a lot of my time attempting to provide abstractions for a moving target. Hardware vendors also do not get the benefit of developing capabilities themselves in secret and providing them when they are ready. I do not think that my suggestion for capability support should be so controversial, because with the number of versions of OpenCL, the embedded and full profiles, the custom device and so on, it already does provide capabilities just not in a nice way!

Feedback

Please let me know what you think. I have carefully considered this topic, so if something isn’t clear please just ask for clarification in the comment section! Just say hello if you’d like, since writing these articles takes a considerable amount of my time.

References

[1] Khronos OpenCL Working Group. The OpenCL 1.2 Specification, revision 15. Web.

[2] Gaster, Benedict R. et al. Heterogeneous Computing with OpenCL, Second Edition. Morgan Kaufmann 2012. Print.