Cached Shadows

Benchmarks are REALLY messy things.

  • It’s easy to make something seem like it performs 10x faster than an alternative by toying with inputs and cherry-picking results.
  • It’s easy to get motivated by one big number (like a 5-10x speedup on one input) and imagine that it’s possible on all inputs — especially with C++ compilers, it’s rarely consistently predictable.
  • Initial “wow!” level improvements tend to degrade as you get closer to real world use cases (due to things like cache locality etc)
  • Even with a high n, Benchmarks are noisy and can vary per run (musn’t run other CPU intensive things while running them).
  • Using benchmark averages obscure outliers (and outliers matter, especially in dsp, but even in UI).
  • Results differ on different machines. (I swear a fresh restart of my Apple M1 made vImage run faster!)

So, here are some cherry-picked benchmarks. The Windows machine is an AMD Ryzen 9 and the mac is a M1 MacBook Pro.

In all cases, the image dimensions are square (e.g. 50x50px) and the times are µs (microseconds, or a millionth of a second) averaged over 100 runs.

That means that when you see a number like 1000, it means 1ms. Please open issues if you are seeing discrepancies or want to contribute to the benchmarks.

Cached Drop Shadows

My #1 performance goal with this library was for drop-shadows to be screaming fast.

99% of the time I’m rendering single channel shadows for vector UI.

Caching does most of the heavy lifting here, giving a 10-30x improvement over using just StackBlur. On macOS:

On Windows:

Note: I haven’t been including JUCE’s DropShadow class. That’s in part because it’s not compatible with design programs like Figma or standards like CSS.

But it also performs 20-30x worse than Stack Blur and up to 500x worse than Melatonin Blur.

To show this clearly, the time axis (in µs) has to be logarithmic:


Leave a Reply

Your email address will not be published. Required fields are marked *