Software Performance Cliff Dead Ahead: Apply OpenCL Now to Stop

Come and join me for an important talk on September 4 in Burlington, Ontario to learn more about what OpenCL can do for your business! Details are here.

Event Summary

Computer software will not run faster on new hardware today unless it is actively designed or refactored to do so.  A modern mobile device, desktop, or server, will contain many parallel processors from many different vendors, and each processor will have unique compute and energy characteristics.  Applications that do not efficiently utilize processor resources will frustrate users with unnecessary delays and might drain batteries rapidly due to poor energy efficiency.  These problems are magnified within compute clusters that might pay for low processor efficiency with extra hardware and energy to compensate.

This talk is beneficial for both a technical and non-technical audience, and will highlight the decisions made by two fictitious competing startup companies to build the best software in the market.  Through these examples, you will understand the motivation for heterogeneous computing, recent processor hardware trends, and the role of the Khronos OpenCL standard for parallel and heterogeneous computing.  At the end of this talk, you will be aware of new opportunities that may significantly impact your business success and software development plans.

“OpenCL offers enormous benefits for the computing community across many different platforms and markets.   AJ is both a practitioner of heterogeneous computing and a member of the Khronos OpenCL working group – and so is perfectly placed to communicate the intricacies and opportunities offered by this new open standard.” — Neil Trevett, Khronos President, OpenCL Chair, VP NVIDIA

SYCL 1.2: Provisional Standard Released and New Video

Today Khronos released the SYCL 1.2 provisional standard, you can read the specification and the press release.   This is a provisional standard, so you can provide feedback about it on the forums.

You can also watch my YouTube video that gives a pitch for the standard, and how it fits in with things.  This video is light on code details, because I want you to understand the basic ideas and then go read the specification for yourself.

Note that this video is unofficial and is not endorsed by Khronos or affiliated with it, the views in here are my own, and my slight mistake and audio stumble at the end is entirely my fault.

Good news, everyone!

As of today I am officially a  member of Khronos!  This provides me with an opportunity to directly work on the OpenCL standard.  I look forward to working with the other members to shape the future of heterogeneous computing.

I have another two major announcements to make in the next month, so watch this space!

PS.  Sorry I have been too busy to record my next training videos according to my original schedule.  I am working on them in my free time, so please bear with me.

Why I Want Khronos OpenCL C++ for Devices

As you know, OpenCL is a standard for heterogeneous computing and developers such as myself completely understand the importance of standardization. There are many issues with OpenCL in its current form, but despite my criticisms I respect that it is a standard. My strict adherence to standards can severely limit my productivity, because at this point the only option I have to program OpenCL devices is OpenCL C.

There are people out there who really like C programming, and I understand many of their reasons. I personally find that I am very productive with C++, because I can use tools such as template metaprogramming, operator overloading, and classes to write concise yet efficient code. Despite my occasional frustration with C++, it allows me to work at a high-level of abstraction so that I can do things like design containers that change their implementation at compile-time based on type traits. Unfortunately, OpenCL C forces me to work at a level where my productivity is greatly compromised and it becomes time-consuming for me to develop OpenCL software.

AMD does have some form of C++ kernel language which I will not use, because the entire point of OpenCL is to maintain portability. Perhaps the AMD C++ kernel language can target OpenCL C, and I admit I have not really investigated this. Khronos needs to provide an official OpenCL C++ language so that developers such as myself can get on with being productive and delivering value to customers.

OpenCL will likely never implement standard C++, instead some form of OpenCL C++ will have to be provided. We can’t assume that the hardware will support virtual functions, and we probably don’t want to impose an unnecessary hardware requirement. I am quite happy to have templates, classes, operator overloading, inheritance, and everything else without virtual anything. Standard C++ must be restricted, and rather than relying upon potentially contradictory restrictions imposed by various vendors targeting SPIR themselves, Khronos must take on the responsibility for OpenCL C++ support so that everyone agrees upon the restrictions imposed.

If OpenCL C++ is provided, I am confident that I can leverage my OpenCL programming experience to develop beautiful containers and libraries.

Calling All OpenCL HPC Customers!

I have been very busy working on a few software projects to really change how professional OpenCL development is done.

The first product will be out at the end of February, and I am looking for people to try it out and test it.   If you contact me and ask for a preview I will send you a sneak peek!

The second product is my vision for how HPC software should be developed.  I am developing a new approach to HPC software, and companies that want a strategic advantage should get in touch with me to discuss how they can get access to it.

AMD Newsletter Spot and Upcoming Videos

I am very pleased to be featured in the AMD Developer Central Newsletter. Here is an outline of the next few videos in the series:

  • Parallel Programming I: Describes what is different about parallel programming and how you should think about parallel algorithms.
  • Parallel Programming II: Synchronization and memory issues with examples of parallel algorithms.
  • Parallel Programming III: Advanced topics and advice on parallel algorithm design.
  • Performance Programming: How to get really good performance from OpenCL and some software engineering advice for OpenCL projects.

I have been reading Structured Parallel Programming for ideas on how to present parallel programming to you.  The next three videos on parallel programming will be done together, and I hope to have them out by the middle of this month.  On average, I will reach my target of one video per month.

OpenCL Training Material: Seeking Sponsorship

I am searching for an organization to sponsor my training videos.  The development of these lectures takes a great deal of my time, and I am providing them for free.  You can save money by watching my lectures rather than paying for expensive professional training.  My lectures are still providing background, and have not yet demonstrated how to write real-life applications.  I assure you that more interesting topics are coming, but as you can appreciate a solid foundation is required first.

Where is this training material going?  Well, I see myself preparing somewhere between 10 and 20 videos covering a complete range of topics.  At the end, you should be an OpenCL and GPU programming expert, though perhaps a bit inexperienced.  You will understand how to tune algorithms to the hardware, and how to do good software engineering with OpenCL.  I will also show you how to use OpenCL within existing applications.

Please contact me if you are interested in sponsoring this work.  I would be happy to include a small advertisement for your organization within my training videos.