On my machine, this reduces incremental compilation time after an edit of
`rendering_server.h` by 1s, and paves the way for more decoupling in
rendering code.
A number of headers in the codebase included `rendering_server.h` just for
some enum definitions. This means that any change to `rendering_server.h` or
one of its dependencies would trigger a massive incremental rebuild.
With this change, we decouple a number of classes from `rendering_server.h`,
greatly speeding up incremental rebuilds for that area.
On my machine, this reduces incremental compilation time after an edit of
`rendering_server.h` by 60s (from 2m57s).
This allows the ARG to better optimize their usage for mobile devices.
In particular, this allows the ARG to avoid loading the contents of the texture into tile memory when rendering into the texture if the contents won't be used anyway.
Mobile devices are typically bandwidth bound which means we need to do as few texture samples as possible.
They typically use TBDR GPUs which means that all rendering takes place on special optimized tiles. As a side effect, reading back memory from tile to VRAM is really slow, especially on Mali devices.
This commit uses a technique where you do a small blur while downsampling, and then another small blur while upsampling to get really high quality glow. While this doesn't reduce the renderpass count very much, it does reduce the texture read bandwidth by almost 10 times. Overall glow was more texture-read bound than memory write, bound, so this was a huge win.
A side effect of this new technique is that we can gather the glow as we upsample instead of gathering the glow in the final tonemap pass. Doing so allows us to significantly reduce the cost of the tonemap pass as well.
This work is a heavily refactored and rewritten from TheForge's initial
code.
TheForge's original code had too many race conditions and was
fundamentally flawed as it was too easy to incur into those data races
by accident.
However they identified the proper places that needed changes, and the
idea was sound. I used their work as a blueprint to design this work.
This PR implements:
- Introduction of UMA buffers used by a few buffers
(most notably the ones filled by _fill_instance_data).
Ironically this change seems to positively affect PC more than it does
on Mobile.
Updates D3D12 Memory Allocator to get GPU_UPLOAD heap support.
Metal implementation by Stuart Carnie.
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: TheForge team
Also formally deprecate the RS function for updating an individual probes size. The functionality was removed in 4.0, but the function itself was mistakenly left exposed.