Transient
I’ve spent most likely the final 12 months interested by implementing an Octree knowledge construction into my C++ sport engine challenge for scene administration and frustum culling of lights and meshes. Proper now my objective is to beat the efficiency of my present iterative brute power method to frustum testing each single mild and mesh within the scene.
I lastly determined to assault this head on and have over the previous week carried out a templated Octree class which permits me to retailer knowledge inside my octree comparable to UUID (uint32_t in my case). I additionally plan to have the ability to repurpose this construction for different options within the sport engine, however for now, frustum culling is my major objective for this technique.
Now right down to brass tacks, I’ve a efficiency situation with std::vector::insert() and the recursive nature of my present design.
Construction
Octree
, that is the bottom class which manages all API calls from the person comparable to insert, take away, replace, question (AABB, Sphere, or Frustum), and so forth. Once I create the Octree, the constructor takes an OctreeConfig struct which holds primary info on what properties the Octree ought to take, e.g., MinNodeSize, PreferredMaxDataSourcesPerNode, and so forth.OctreeDataSource
, this can be a easy struct that holds an AABB bounding field that represents the information in 3D house, and the worth of the DataType, e.g., a UUID. I plan to additionally lengthen this so I can have bounding spheres or factors for the information sorts aswell.OctreeNode
, this can be a non-public struct throughout the Octree class, as I are not looking for the person to entry the nodes instantly; nevertheless, every node has astd::array
for its kids, and it additionally holds a, 8> std::vector<:shared_ptr>>>
which holds a vector of sensible tips to the information supply.
Downside
My present situation is the efficiency influence of std::vector::insert()
that is known as recursively by way of the OctreeNode’s once I name my Octree::Question(CameraFrustum) methodology.
As seen above in my construction, every OctreeNode holds an std::vector
of information sources and once I question the Octree, it vary inserts all of those vectors right into a single pre-allocated vector that’s handed down the Octree by reference.
Once I question the Octree, it takes the next primary steps:
Question Technique
- Octree::Question
- Create a static
std::vector
and make sure that on creation it has reserved house for the question (presently I’m simply laborious coding this to 1024 as this sufficiently holds all of the mesh objects in my present octree take a look at scene, so there are not any reallocations when performing anstd::vector
vary insert). - Clear the static vector.
- Name
OctreeNode::Question
and move the vector as reference.
- Create a static
- OctreeNode::Question
- Examine Rely of information sources in present node and kids, if now we have no knowledge sources on this node and it is kids, we return – simples 🙂
- Conduct a frustum verify on the present node AABB bounds. Result’s both Accommodates, Intersects, or DoesNotContain.
- Accommodates: (PERFORMANCE IMPACT HERE) If the present node is absolutely contained throughout the frustum, we are going to merely embody all DataSources into the question from the present and all youngster nodes recursively. We name
OctreeNode::GatherAllDataSources
, and move the static vector created inOctree::Question()
by reference. - Intersects: We individually frustum verify every
OctreeDataSource::AABB
inside this node’s knowledge supply vector, then we recursively nameOctreeNode::Question
on every of the kids to carry out this perform recursively.
- Accommodates: (PERFORMANCE IMPACT HERE) If the present node is absolutely contained throughout the frustum, we are going to merely embody all DataSources into the question from the present and all youngster nodes recursively. We name
OctreeNode::GatherAllDataSources (the issue youngster)
I’ve used profiling macros to measure the gathered period of time this perform takes every body. If I name Question as soon as in my essential engine sport loop, the GatherAllDataSources() takes roughly 60% if no more of your complete Question methodology time.
You may as well see from these profile outcomes that the Octree Question is taking double the time as “Ahead Plus – Frustum Culling (MESHES)” which is the brute power method to frustum checking each mesh throughout the scene (the scene has 948 meshes with AABBs).
I’ve narrowed the difficulty right down to the road of code with the remark beneath:
void GatherAllDataSources(std::vector& out_data) {
L_PROFILE_SCOPE_ACCUMULATIVE_TIMER("Octree Question - GatherAllDataSources"); // Accumulates a profile timer outcomes every time this methodology is known as. Profiler begins time on development and stops timer and accumulates end result inside a ProfilerResults class.
if (Rely() == 0) {
CheckShouldDeleteNode();
return;
}
if (!m_DataSources.empty()) {
// That is the road of code which is taking many of the queries search time
// As you possibly can see beneath aswell, the time complexity will increase as a result of
// I'm calling this perform recursively for all kids, virtually,
// gathering all knowledge sources inside this node and all kids
out_data.insert(out_data.finish(), m_DataSources.start(), m_DataSources.finish());
}
if (!IsNodeSplit())
return;
// Recursively collect knowledge from youngster nodes
for (const auto& youngster : m_ChildrenNodes) {
if (youngster) {
child->GatherAllDataSources(out_data); // Go the identical vector to keep away from reminiscence allocations
}
}
}
Query Time
How can I considerably enhance the effectivity of Gathering knowledge sources recursively from my youngster nodes?
I’m open to thoroughly altering the method of how knowledge sources are saved throughout the Octree, and the way the general construction of the Octree is designed, however that is the place I get caught.
I am very inexperienced in relation to algorithm optimisation or C++ optimisation, and as this can be a new algorithm I’ve tried to implement, I am discovering it very troublesome to discover a answer to this drawback.
Any ideas/methods are welcome!
You’ll find the complete model of my present Octree implementation code right here (please notice I’m not completed but with different performance, and I’ll most likely be again if I am unable to discover options for Insert and Take away optimisation!).
Listed below are some sources I’ve reviewed:
Should you’re additionally involved in the remainder of my code base it may be discovered on GitHub by way of this hyperlink. I principally function within the Growth department. These adjustments have not been pushed but, however I’ve confronted a whole lot of challenges throughout this challenge’s journey so when you’ve got any additional insights to my code or have any questions on how I’ve carried out completely different options, please give me a shout!