Zibra AI Blog | ZibraVDB Performance and Quality Optimization

ZibraVDB is created to bring volumetric effects in full resolution in real time for Unreal Engine. Though it perfectly works on consumer grade hardware at high frame rates, different users want to achieve different goals in their projects, while Virtual Production and CGI industries are concerned about quality first and acceptable frame rates using high-end hardware, gaming companies doing content with performance first approach simplifying or even cutting of expensive visual features during development.

In this article, we’d like to cover how ZibraVDB works inside Unreal Engine, how to measure its performance, which steps ZibraVDB introduces to Unreal Engine, and how you can tune your content to meet quality and performance goals. Also, we will cover the most recent ZibraVDB version 1.3.0, which brings a major performance uplift since the initial release.

We already stated that the performance of ZibraVDB depends on raw GPU compute power and does not utilize any vendor-specific tensor cores or raytrace cores, which means we are not vendor locked and can run on any modern GPU, including mobile phones.

We used several GPUs to measure performance, and our choice was based on the most typical user hardware to give you the closest to real-life picture. For actual performance testing, we used these GPUs:

NVIDIA GeForce RTX 4090 Laptop as a core part of MSI Stealth 16 with Core Ultra 9-185H, 32GB RAM, 2TB SSD. This GPU is the ultimate GPU you can get to work with Unreal Engine. This GeForce RTX 4090 utilizes AD103 silicon, which is used in the desktop RTX 4080 and RTX 4080 Super. Even the most recent RTX 5090 barely outperforms previous flagships, so you can expect similar performance from it.
NVIDIA GeForce RTX 3090 24 GB utilizes GA102 silicon, which in turn powers NVIDIA RTX A6000, with the only difference being having twice as much but slightly slower VRAM.‍
AMD RX 5700 XT 8 GB is the closest analog to the PlayStation 5 GPU. They have the same computing power and the same bandwidth. On this GPU, we usually measure all gaming-related content.

‍

Bandwidth Bottleneck

A primary obstacle to using VDB is its large file size. The time required to stream VDB frames from disk to memory can exceed the available frame time for real-time rendering. While compression reduces file size, rendering compressed data necessitates a decompression step. This decompression needs to occur directly into RAM or VRAM to avoid the overhead of streaming full-resolution frames. ZibraVDB is specifically engineered to achieve this.

Our high-speed decompressor effectively eliminates the bandwidth bottleneck, shifting the performance limitation to rendering, which is the desired scenario. The decompression process involves the following steps:

1. VDB Sequence Compression

The initial step is to compress the desired real-time effect. ZibraVDB employs a custom file format that consolidates all frames into a single `.zibravdb` file.

2. Loading Compressed Frames into RAM

In the real-time application, the next set of compressed frames to be rendered is read from the `.zibravdb` file and loaded into RAM. It's not necessary to load all frames simultaneously; instead, specific frames for rendering can be selected.

3. Passing a Frame to the Decompressor

Once a compressed frame is in memory, it's ready for decompression. The ZibraVDB decompressor is implemented as a compute shader. A single frame is taken from RAM and passed to this decompressor.

4. Direct Decompression to VRAM

The decompressor processes the compressed data and directly stores the decompressed result in VRAM, making it immediately available for rendering.

Notably, the decompressor is format-agnostic. In the Unreal Engine integration, ZibraVDB decompresses data directly into a sparse texture.

‍

ZibraVDB in Unreal Engine

To calculate the cost of ZibraVDB in the rendering pipeline, check several markers in the Unreal Engine profiler:

Decompression — this step decompressed the volume into raw data.
Prerender — it includes building sparse volumetric data texture and calculating illumination in downscaled resolution.
Ray Marching — visualization of volume. Important note here — this pass is part of Unreal Engine Translusency pass and gives you aggregate time for all translucent object rendering on the scene, including all ZibraVDBs.

Let’s take a look at all these stages and how they perform on different hardware. For the next tests, we will be using a single frame from the explosion effect in multiple variations that look like this:

‍

Decompression

How quality affects decompression speed

ZibraVDB's compressor uses a single Quality parameter (0-1) to control the compression level. This lossy compression method means that lower quality settings result in more artifacts but smaller file sizes. The default Quality of 0.6 achieves a 40x compression rate, suitable for many applications. Decompression speed is influenced by the size of the compressed effect. Below is a comparison of decompression performance across various Quality levels and effects.

Single channel effect

Here are the decompression speed metrics for different Quality values on a 45.5 MB VDB frame:

3-channel effect

Here are the decompression speed metrics for different Quality values on a 106 MB VDB frame:

For further performance measurements, we will use the Quality value of 0.6. This gives the best middle ground between the resulting size and quality of the effect.

How volume resolution affects decompression speed

Let’s also see how the amount of voxels affects the decompression performance of ZibraVDB. We took the same 106 MB VDB frame from previous tests and created downscaled versions of it:

‍

Prerender Stage

During the prerender stage, ZibraVDB performs all necessary computations for raymarching. If a frame has changed, the sparse structure is adapted accordingly. Subsequently, the illumination of the effect and shadow masks for each light casting volumetric shadows are calculated.

Illumination calculation is crucial as it defines the effect's visual appearance. Computing illumination during raymarching would be computationally prohibitive for real-time applications. Therefore, in real-time volumetric renderers, illumination is calculated beforehand and stored in a separate texture, which is then sampled during ray marching.

Typically, illumination is calculated at a lower resolution to optimize performance. ZibraVDB for Unreal Engine (UE) defaults to a ¼ resolution illumination texture, adjustable within the actor's properties. This approach significantly reduces memory usage while maintaining adequate visual quality.

ZibraVDB for UE supports up to 15 light sources affecting volumes, including any combination of point, spot, and directional lights.

Generally, the prerender stage is more performance-intensive than decompression due to the heavy computations involved. Performance is significantly improved if the ZibraVDB frame remains unchanged, negating the need to rebuild the sparse structure.

‍

Ray Marching

The raymarching process is standard, with performance directly proportional to the effect's resolution and the number of steps taken. ZibraVDB allows independent adjustment of raymarching parameters for each effect instance, enabling fine-tuning of rendering performance.

ZibraVDB for UE renders effects using a translucent material incorporating a custom raymarcher. The material then applies to a cube mesh. Below metric for rendering effect in 1920x1080 resolution:

ZibraVDB can also employ Heterogeneous Volume (HV) for rendering. While HV is approximately three times slower than ZibraVDB's custom renderer, it remains a viable option when path tracing is desired.

Now let’s sum everything up:

As you can see, with ZibraVDB rendering of volumetric effects are not limited by decompression, and performance mostly depends on the implementation of the renderer.

It is important to mention that those metrics do not represent the average performance of the ZibraVDB in the scene but rather the worst-case scenario. Most VDB effects are simulated for 30 FPS, meaning that in real-time 60 FPS applications ZibraVDB frame will be changed every second game frame. Hence, you can skip decompression and rebuilding sparse texture, and have much smaller frame times on every second frame.

Some applications might require a stable frame time. For these cases, we allow users to optionally force decompression even if the ZibraVDB frame hasn’t changed by enabling the parameter Allow Skip Decompression on the ZibraVDB actor.

ZibraVDB for UE also provides frustum and distance culling, allowing to skip any computations when an effect is out of view.

‍

Memory usage

Memory over volume resolution

Let’s see how the size of the effect correlates with the amount of video memory required for rendering:

As you can see, the amount of required VRAM is 2-3 times bigger than the size of the frame because you need to allocate many resources required for rendering. ZibraVDB does not reallocate memory every frame, instead, it just allocates memory needed to fit the biggest frame.

Those numbers are approximate and depend a lot on the rendering settings of the effect. For example, you can save some memory by disabling voxel interpolation or lowering illumination resolution.

Memory breakdown

So, what exactly is the memory spent on? Under the hood, ZibraVDB tries to reuse resources as much as possible to avoid additional memory allocations. Here are the 3 main groups of resources.

Decompression resources. Decompressed frame and some additional stuff needed for decompression.
Illumination textures. Downscaled textures that store the calculated illumination of the effect. For the image below, we used ¼ downscale.‍
Sparse textures. Full-res sparse representation of the effect with applied illumination. These textures are sampled by the ray marcher during rendering.

‍

ZibraVDB 1.3.0 vs. 1.2.4

Among features, ZibraVDB 1.3.0 introduced major performance improvements since its initial release. It took almost 3 months of thorough profiling to determine hot spots and ways to mitigate them. We already had a pretty fast renderer, and this time focused on the most time-consuming parts - Decompression and Prerender stages. The key problem was that these stages have very different bottlenecks on different GPU architectures. Since we are mostly focused on high-end NVIDIA and console-class AMD hardware, we made an extensive analysis using all vendor-specific tools to figure out the best approach for optimization. We made numerous changes to memory access patterns, using Local Data Share, register pressure and SM throughput, and achieved impressive results.

To measure performance improvement in the most recent patch, we slightly changed the testing approach in this section. For AMD RX 5700 XT, we took relatively small 127 and 166 frames effects up to 4 GB, close to gaming use cases, and not to be limited by VRAM size. These effects are Mid-Air Explosion and High-Res Smoke Plume from the JangaFX free VDB pack.

For high-end GPUs, we took 180 frames, 10 GB effect, simulated in Houdini using Axiom, which is closer to something that Virtual Production and CGI use.

And here are metrics for those effects:

These numbers are interesting not only as a measure of achievement, but can give some hints about different GPUs and how they behave:

NVIDIA users will gain a massive boost in decompression on big effects for their needs. We did not expect such a huge uplift for it. Looking back at our profiling tricks, it means you may use better compression quality with the new update, with even better performance.
NVIDIA ADA generation GPUs (RTX A6000 Ada, 40xx, 50xx series) seem to be less prone to memory bandwidth issues due to big L2 caches, but RTX A6000, being based on the previous NVIDIA Ampere architecture, definitely needs this update.
AMD RX5700 XT and NVIDIA Ampere based GPUs got the best performance uplift across all GPUs we used to test, and we strongly recommend gaming studios get this update as well.

This is not the last performance update. We are constantly improving user experience, not only in terms of features, but we always think of the cost of features before we introduce them.

Join our community and use ZibraVDB in your projects. We are eager to hear your feedback, especially if you lack some functionality or are still disappointed with the current ZibraVDB performance.

ZibraVDB Performance and Quality Optimization

Oleksandr Puchka

Dmytro Kucherenko

Revolutionize Your Volumetric VFX Workflow