Dependency oriented threading

Information

For my specialization I decided to create dependency oriented threading for systems in entt. Ever since the first year of TGA I’ve been playing with the thought of automatically threading gameplay systems in such a way that most of the gameplay code is written the exact same as if it wasn’t in a multi-threaded environment with a separate system drawing conclusions on what you can run in parallel from constraints on components. I tried it multiple times and got a prototype running but I could never figure out how to properly constrain the developer so that they can’t accidentally create race conditions.

This project is mainly to try and understand how threaded systems work and how to work within a highly threaded environment while also minimizing the pain points working in a multi-threaded environment as debugging without tools to help validate is infamously difficult in a threaded environment.

How

For me to be able to automatically thread the systems I first need a standardized way of defining dependencies within the systems. For this I ended up using the hashcode of the component type the system is accessing as well as if we’re writing to the component or not. This data can then be used to build dependencies between jobs.

struct JobDependency
{
    size_t myID;
    bool myIsWriting;
};

For my case this was used to build JobDescriptions which held data about what function to execute as well as the reading/writing IDs (IDs being class hashcodes for my project) and dependencies which are indexes to other JobDescriptions that this one depends on.

JobDescriptions also held onto a name for debugging tools so I could more easily identify in what JobDesc things went wrong.

struct JobDesc
{
    std::function<void()> myFunc;
    std::set<size_t> myReadingIDs
    std::set<size_t> myWritingIDs
    std::set<size_t> myDependencies
    JobHandle myHandle;
    int mySortingOrder;
    unsigned int myUpdatingFrequency;
};

One part of this project that was really hard to get right was the JobScheduler. It went through multiple iterations where at first I had a bucket system where jobs get dispatched in groups depending on the dependencies between them but it didn’t even get close to maximum concurrency and it forced a lot of unnecessary sync points.

So I ended up throwing that system away in favor of waiting for specific jobs that need to be executed first which added more requirements to the JobSystem to be able to give out JobHandles and wait for jobs to finish.

struct Job
{
    std::function<void()> myFunc;
    std::vector<JobHandle> myDependencies;
    JobHandle myHandle;
    unsigned int myChildren;
    bool myIsExecuting;
};

Scheduling Jobs

This was done by extending the data within a Job in the JobSystem with a JobHandle and a vector of JobHandles as dependencies, as well as the number of children and a bool to check if the job is currently executing or not.

The reason I decided to add a number of children variable was to minimize context switching. Imagine a job only has one child dependency, why risk a context switch when the child job is likely to be able to be ran on the current thread without notifying other threads.

The reason we need to know if the job is executing is because a Job is considered completed as soon as it isn’t queued anymore so we cant remove it from the queue when we start executing but we don’t want to call the same Job twice, hence the IsExecuting boolean to make sure we only call this job once.

inline void JobScheduler::Build()
{
    std::ranges::sort(myJobs, {}, &JobDesc::mySortingOrder);

    for (unsigned i = 0; i < myJobs.size(); ++i) {
        std::set<JobHandle> implicitDependencies{};
        myJobs[i].myDependencies = {};

        for (int otherJobID = i - 1; otherJobID >= 0; --otherJobID)
        {
            if (implicitDependencies.contains(otherJobID)) continue;
            if (!myJobs[i].CanExecuteAlongside(myJobs[otherJobID]))
            {
                myJobs[i].myDependencies.insert(otherJobID);
                AddImplicitDependency(implicitDependencies, myJob[otherJobID]);
            }
        }
    }
}

With the final version of the JobScheduler it now builds dependencies with the use of JobDescriptions to support the addition of systems in an unordered manner so gameplay systems can correctly register their systems in relation to engine systems and it also gives the engine systems a way of placing systems after gameplay systems by default.

It also finds all implicit dependencies to avoid having massive dependency chains as it would add more overhead to both the job scheduler as well as the JobSystem.

inline void JobScheduler::Execute()
{
    for (const JobDesc& job : myJobs) {
        if (myExecutionAmount % job.myUpdatingFrequency != 0) continue;
        std::vector<JobHandle> dependencies{};

        for (size_t dependency : job.myDependencies)
        {
            dependencies.push_back(myJobs[dependency].myHandle);
        }

        job.myHandle = JobSystem::Enqueue(job.myFunc, std::move(dependencies));
    }
}

When executing Jobs all we really have to do is collect the JobHandles of the jobs dependencies while queueing the current Job with the newly created handles vector and storing the resulting JobHandle in the JobDescriptions handle variable for the upcoming jobs to use.

The beauty behind this is that it’s completely fine to just not run a Job as it will by default count as completed so implementing updating frequencies is a breeze.

Programmer QoL

template <typename ... Args>
std::vector<JobDependency> System<Args...>::GetDependencies()
{
    return { JobDependency{ typeid(Args).hash_code(), !std::is_const_v<Args> }... }
}

In order to define dependencies you have to create a vector of JobDependencies. This is done automatically when extending the System class through template parameters and writeable/readonly is determined through the constness of the type in the template parameter.

In use this would look something like this ->

class MeshRenderingSystem : public 
System<SceneSingleton, const MeshRenderer, const SpriteSheetComponent, const Transform>
{
public:
    void Update(registry_handle& aView) override;    
}

To make it easier for the programmers I’ve forwarded regular entt functions through a SystemRegisterView which constrains the functions to only use the previously specified types and also automatically resolves the constness of the defined types making it extremely easy to switch between readonly and writeable when necessary.

Performance

Real-world scenario

Threaded

~19ms

Synchronous

~22ms

~3ms difference

Removed big blocker from scenario above

~3.5ms

~5ms

~1.5ms difference

Closing Thoughts

I think it’s interesting that the difference of threading to single threaded performance isn’t that large considering that there are multiple systems that are parallelized. It reminds me of a talk given at GDC 2015 about multithreading the Destiny 2 engine where they mentioned that the ideal duration of a job is around (0.5ms - 2ms) but in my scenario most of my parallelizable jobs are running for < 0.2ms which proves their point. It might’ve saved some time but the overhead of having dependencies and waking up sleeping threads cuts into those profits. The only place that the threading did see measurable difference was where it could correctly identify multiple big systems and run them in parallel

I think that the auto threading of systems is a really good in concept but it’s still important to think about how the threading is done and to make sure that your systems are flexible in their update order to allow maximum concurrency. I think there’s probably some overhead that could be optimized but more importantly I think if I were to structure the data correctly and bundle some systems together to try to hit the 0.5ms - 2ms uptime for a job as well as think more about threading the game more I would probably get way better results.

Unrelated to performance, I still haven’t constrained the developer perfectly as systems outside of entt will need to be correctly synced in another way. One way to enforce these types of syncs is to develop an execution engine just like how the Destiny 2 did, where you can validate that all handles to all systems have properly been requested by the program and if two threads ever try to access the same resource you could assert to make sure you don’t get race conditions.