- Vulkan implementations in `RenderingDeviceDriverVulkan`.
- Raytracing instruction list in `RenderingDeviceGraph`.
- Functions to create acceleration structures and raytracing pipelines
in `RenderingDevice`.
- Raygen, Miss, and ClosestHit shader stages support.
- GDScript bindings.
- Update classes documentation.
- Unimplemented placeholders for Metal and D3D12.
- Build acceleration structure command.
- Expose a shader preprocessor define.
- Align build scratch address.
- Create STB after creating the pipeline.
- Separate acceleration structure barriers.
- Use transforms in TLAS instances.
- AnyHit and Intersection stages.
- Optionally set acceleration structure build input buffer usage.
- Introduce instances buffer.
- Move scratch buffers to RenderingDevice.
- Rename AccelerationStructureGeometryBits.
- Use regular buffer creation and free routines for scratch buffer.
- Store trackers in acceleration structures.
- Bump Shader SPIR-V version to 1.4
- Add Position attribute location in blas_create.
- Encapsulate MoltenVK check in preprocessor macro.
- Split SUPPORTS_RAYTRACING for pipeline and query.
- Add support for vertex bindings and UMA vertex buffers in D3D12.
- Simplify 2D instance params and move more into per-batch data to save
bandwidth
Co-authored-by: Skyth <19259897+blueskythlikesclouds@users.noreply.github.com>
Co-authored-by: Clay John <claynjohn@gmail.com>
Co-authored-by: A Thousand Ships <96648715+athousandships@users.noreply.github.com>
This work is a heavily refactored and rewritten from TheForge's initial
code.
TheForge's original code had too many race conditions and was
fundamentally flawed as it was too easy to incur into those data races
by accident.
However they identified the proper places that needed changes, and the
idea was sound. I used their work as a blueprint to design this work.
This PR implements:
- Introduction of UMA buffers used by a few buffers
(most notably the ones filled by _fill_instance_data).
Ironically this change seems to positively affect PC more than it does
on Mobile.
Updates D3D12 Memory Allocator to get GPU_UPLOAD heap support.
Metal implementation by Stuart Carnie.
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: TheForge team
Introduce a specialised `texture_get_data` for `RenderDeviceDriver`,
which can retrieve the texture data from the GPU driver for shared
textures (`TEXTURE_USAGE_CPU_READ_BIT`).
Closes#108115
Metal Support contributed by Migeran (https://migeran.com) and Stuart Carnie.
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Gergely Kis <gergely.kis@migeran.com>
Texture::slice_trackers is now a pointer and is allocated on demand only when a shared texture is created.
This makes copying Texture significantly faster.
The function update_perf_report() is expensive and is called every
frame.
Most of it is not necessary unless the user calls get_perf_report
Affects #102173
PR #100062 introduced BUFFER_USAGE_DEVICE_ADDRESS_BIT.
However it did so by adding a boolean to uniform_buffer_create(), called
"bool p_enable_device_address".
This makes maintaining backwards compatibility harder because I am
working on another feature that would require introducing yet another
bit flag.
This would save us the need to add fallback routines when the feature I
am working on makes it to Godot 4.5.
Even if my feature doesn't make it to 4.5 either, this PR makes the
routine more future-proof.
This PR also moves STORAGE_BUFFER_USAGE_DEVICE_ADDRESS into
BUFFER_CREATION_DEVICE_ADDRESS_BIT, since it's an option available to
both storage and uniforms.
This PR also moves the boolean use_as_storage into
BUFFER_CREATION_AS_STORAGE.
Fixes#90017Fixes#90030Fixes#98044
This PR makes the following changes:
# Force processing of GPU commands for frame_count frames
The variable `frames_pending_resources_for_processing` is added to track
this.
The ticket #98044 suggested to use `_flush_and_stall_for_all_frames()`
while minimized.
Technically this works and is a viable solution.
However I noticed that this issue was happening because Logic/Physics
continue to work "business as usual" while minimized(\*). Only Graphics
was being deactivated (which caused commands to accumulate until window
is restored).
To continue this behavior of "business as usual", I decided that GPU
work should also "continue as usual" by buffering commands in a double
or triple buffer scheme until all commands are done processing (if they
ever stop coming). This is specially important if the app specifically
intends to keep processing while minimized.
Calling `_flush_and_stall_for_all_frames()` would fix the leak, but it
would make Godot's behavior different while minimized vs while the
window is presenting.
\* `OS::add_frame_delay` _does_ consider being minimized, but it just
throttles CPU usage. Some platforms such as Android completely disable
processing because the higher level code stops being called when the app
goes into background. But this seems like an implementation-detail that
diverges from the rest of the platforms (e.g. Windows, Linux & macOS
continue to process while minimized).
# Rename p_swap_buffers for p_present
**This is potentially a breaking change** (if it actually breaks
anything, I ignore. But I strongly suspect it doesn't break anything).
"Swap Buffers" is a concept carried from OpenGL, where a frame is "done"
when `glSwapBuffers()` is called, which basically means "present to the
screen".
However it _also_ means that OpenGL internally swaps its internal
buffers in a double/triple buffer scheme (in Vulkan, we do that
ourselves and is tracked by `RenderingDevice::frame`).
Modern APIs like Vulkan differentiate between "submitting GPU work" and
"presenting".
Before this PR, calling `RendererCompositorRD::end_frame(false)` would
literally do nothing. This is often undesired and the cause of the leak.
After this PR, calling `RendererCompositorRD::end_frame(false)` will now
process commands, swap our internal buffers in a double/triple buffer
scheme **but avoid presenting to the screen**.
Hence the rename of the variable from `p_swap_buffers` to `p_present`
(which slightly alters its behavior).
If we want `RendererCompositorRD::end_frame(false)` to do nothing, then
we should not call it at all.
This PR reflects such change: When we're minimized **_and_**
`has_pending_resources_for_processing()` returns false, we don't call
`RendererCompositorRD::end_frame()` at all.
But if `has_pending_resources_for_processing()` returns true, we will
call it, but with `p_present = false` because we're minimized.
There's still the issue that Godot keeps processing work (logic,
scripts, physics) while minimized, which we shouldn't do by default. But
that's work for follow up PR.