Intel’s IPP is a free to use fast vector and matrix libraries available across platforms. It’s a great match for working with dsp and images in the JUCE framework, in particular on Windows.
Although IPP works on Intel Apple hardware, Apple has its own cross-architecture high performance library called Accelerate, which includes vDSP and vImage functions.
Lets navigate the complexity and figure out how to reliably get fast vector and matrix math in our JUCE projects on Windows, including in CI.
Intel has something called “One API” — an ambiguous umbrella term for an insane amount of tooling, frameworks and APIs for data science, performance and machine learning.
Installing IPP (Intel Performance Primitives) and MKL (Math Kernel Library) are both possible through One API.
JUCE has some loose support in the Projucer for including both IPP and MKL, and there’s been some confusion, as there’s overlap between the two.
Your choice on what to install and use depends completely on what functions you need access to.
In most cases, IPP is enough. It lets you do a lot with vectors, the same sorts of things that are in JUCE’s own
MKL is…. more flexible and broader, however it does have some useful vector operations as well. Here’s a listing of what’s available in MKL’s vector math.
We’ll install both in this blog post, to make life flexible. Flexible is good.
JUCE has some feature detection around parallel/sequential versions of IPP, but parallel is deprecated.
CMake support doesn’t really exist in JUCE, as of this post.
JUCE does provide
FloatVectorOperations as a wrapper around a subset of Apple’s Accelerate, however JUCE sidesteps IPP in favor of working directly with some custom SSE vectorization.
Vectorization would be perhaps be more efficiently and comprehensively handled by IPP. There’s currently an open Feature Request to ask JUCE to improve IPP support for this reason.
Although oneAPI is free to use, it’s not free to publicly distribute, which means the IPP package must be downloaded and installed on each CI run if you code is public.
Intel kindly provides some examples for installing and using IPP in CI — although they are quite overcomplicated for our needs.
The examples contain the “offline installer” URLs we’ll need as well as the magic incantations to run the installer silently — quite useful! You can also find links to the “offline” downloads over on Intel’s website.
We’ll be using Intel’s
bootstrapper.exe on Windows to pick the components of OneAPI to install. You can see a list of the available components here.
We’ll stick to just installing IPP and MKL.
The latest version of IPP (as of February 2023) is from 2021, but MKL has a recent (2023) release.
Here’s an example of silently running the 2023 installer and selecting IPP and MKL. You can run this locally to get setup:
# Download 3.5 gigs of Intel madness curl --output oneapi.exe https://registrationcenter-download.intel.com/akdlm/irc_nas/19078/w_BaseKit_p_2023.0.0.25940_offline.exe # Extract the bootstrapper ./oneapi.exe -s -x -f oneapi # Silently install IPP and MKL, note the ":" separator ./oneapi/bootstrapper.exe -s -c --action install --components=intel.oneapi.win.ipp.devel:intel.oneapi.win.mkl.devel --eula=accept -p=NEED_VS2022_INTEGRATION=1 --log-dir=. # List installed components ./oneapi/bootstrapper.exe --list-components
You should see output that both IPP and MKL were installed and should be able to verify the intel libraries are now on disk:
If you’ve previously installed something from oneAPI, you might have to first run
I highly recommend caching this install in CI vs. trying to bundle with your product, as the download is huge (~3.5 gigs). On GitHub Actions, the download and install takes 25 minutes. Cached, it’s 180MB and takes 10 seconds to restore.
If working in GitHub actions, I recommend taking advantage of
actions/cache/save — the default
actions/cache will not save the cache if any step afterwards fails, which is pretty annoying for working on your CI pipeline.
actions/cache/save allows you to just run the download and install once, no matter what happens afterwards.
The fact that the One API is not open source (just free) adds some friction with getting started (as well as filing bugs with Intel!)
However, Intel does provide CMake helpers to help locate and install IPP. After installing on Windows, you can find them in
C:\Program Files (x86)\Intel\oneAPI\ipp\latest\lib\cmake\ipp
Supposedly their CMake exports a variable called
IPP_LIBRARIES containing a list of targets you can link against, however it didn’t work for me, nor others.
However, you can directly link individually exported targets to your target in CMake. All you need to do name them like so:
find_package(IPP) if(IPP_FOUND) target_link_libraries(MyProject PRIVATE IPP::ippcore IPP::ipps IPP::ippi IPP::ippcv) endif()
A list of available IPP libraries can be found in the CMake config:
The MKL CMake seems a bit more sophisticated and includes better output. There’s also a set of CMake
OPTIONS such as
ENABLE_BLACS. And they have an insane (in a bad way) “Link Advisor” to help you manually pick what to manually link against to.
In CMake you can link to
MKL::MKL, which is what I did for now. I only needed their vector math library (VML), but couldn’t figure out how to link against it directly.
So far, we’ve been proceeding with static linkage: the intel libraries are being built into your binary. Intel does support dynamic linkage as well, but I’m not clear when that would be a good decision.
The whole magic behind IPP is that it feature detects what CPU is in use at run time and picks the best primitive to use via its dispatching.
You used to have to call
ippStaticInit to setup the dispatcher, but no longer:
Since Intel Intel® IPP 9.0, there is no requirement to initialize the dispatcher by calling ippInit() for static linkage anymore.See docs, assuming Intel didn’t break their web documentation again.
You can call the functions
ippGetCpuFeatures to confirm the dispatcher is happy.
Both vDSP and Intel IPP have very specific and (thankfully) consistent naming practices.
With IPP, if you want to do something like fill a vector, you can visit the documentation for something like the
Set function and see there are a dozen manifestations of the function call.
Most of these are just for different data types, such as
ippsSet32f which is used with floats,
ippsSet32s for ints, etc.
Unlike vDSP, there are also specific “in place” descriptors. You’ll see many function variants with
_I appended to them, like
ippsAdd_32f_I These don’t take an output vector, they are guaranteed to modify the input vector in place.
You can see an example in CI in Pamplejuce, my JUCE template repo. It downloads, installs, caches, tests and exposes the
PAMPLEJUCE_MKL preprocessor definitions when they are available.
As far as the Intel APIs themselves…good luck!
It’s somewhat of a mess, not exactly developer friendly. Good documentation and examples are few and far between.
See a recent post of mine where I try to get Gaussian blur working. Apple’s Accelerate is currently much nicer and more succinct…