L. Spiro Engine

by **giallanon** » Thu Feb 23, 2012 9:29 pm

Hello, I'm following your blog and I find it very inspiring.
Also I often agree with most of the things you write on gamedev.
Anyway, I see you still have not talked about multithreading in your engine.
I'm currently working on the new incarnation of my own engine, and I'm reading as much as I can about multithreading just to see how other people are playing with it so I'm very curious to read about your way of multithreading in the next gen

by **L. Spiro** » Fri Feb 24, 2012 5:35 am

Thank you for your interest in my work.

Threading is an area that isn’t fully decided yet. There are things I want to test and benchmark, but there are a few solidified plans as well.

#1: Unsurprisingly, sound already runs on a second thread.
#2: Rendering will very likely be moved to a second thread as well. This type of multi-threaded rendering is not to be confused with DirectX 11’s ability to use multiple threads for rendering to the screen, which is another thing I would plan to employ.
#3: Resource loading will certainly have multi-threaded capabilities, particularly for streaming worlds etc.
#4: Particle systems.
#5: AI.

Other plans are more abstract and will only be decided once I have a chance to test things. Converting images to other formats with the help of OpenMP turned out to be slower than not.

Thread pooling will be employed in all cases where I can imagine using it. I intend to experiment with multi-threading physics. Because of its nature (changing the motion of one object can affect other objects) physics does not leave much room for parallelism, but I intend to create “buckets”of closed systems that can each be handled in parallel. The obvious risk is that the unpredictable nature of object interactions could cause an object, especially if traveling at high speed, enter another bucket and possibly pass through another object for a frame.

Another place where it should help is within a new colored shadow mapping routine I have created (which will be documented here later too).

L. Spiro

by **giallanon** » Fri Feb 24, 2012 11:11 am

Well, this is pretty much what I've come to (ie: I don't have a magic solution, I'm just trying and see what happen).
A part form sound, resource streaming and other obviously areas that are "easy" to multithread, my main concerns are about rendering.

At the moment I'm experimenting with one main thread which hold DX9 and perform all rendering and OS message pump (plus some internal handling), and a secondary thread that run the application.
I find this approach elegant, because from the 2nd thread I can "submit" job to the GPU (DX9 for now) and then forgive about it; job will be rendered ASAP without the need to wait for its completion.
The problem with this approach is that a lot of lock can happen when I need to update something that, in turn will be, rendered by the main thread at unknown time.
GUI update for example are a problem. Whenever I want to update a label text, I've to lock something or maybe send a message to the main thread which in turn means locking a message queue.

On the other way, I could go with the "single thread model", where everything is handled by a single thread that loop forever and every n msec will render something to the screen. This way I won't have lock issue and whenever I need to multithread something (for example animation) I can do it on demand, wait for the jobs to finish and then keep on running with the main thread (render things).

Not sure which way to go. 2nd one seems a bit tricky at times, but I guess will perform better in the end
Guess I'll have to try and see what happens

by **L. Spiro** » Fri Feb 24, 2012 3:43 pm

Why do you need extra locks when changing label texts etc.?
The only reason I can see why this would be necessary is if you were submitting a “render this button” command along with a pointer to the button, rather than just submitting the individual API calls needed to render the button.

The only time when such a locking system is necessary is when textures, index buffers, or vertex buffers themselves are modified.
In these cases, the solution is simply to use a second copy of the resource, for at least one frame.
If you are rendering text for the label directly to a texture, just keep 2 copies of the texture and swap between them each time the text is changed.
Since label texts are almost always static, this only adds a small amount of resource overhead.

Index buffers rarely change so it is unlikely you would ever duplicate one of those, and vertex buffers rarely change if you are using GPU skinning.
For that of course you would have a texture that is duplicated instead, but still that should be reasonable.

An important note about multi-threaded rendering is that there is usually a limit as to how many frames can be buffered at once.
Usually double-buffer queues are used, so the render thread is rendering the last frame while a second command buffer is being filled for the current game rendering.
If that buffer is closed (end-of-frame issued) and the first still hasn’t been finished by the render thread, the next command sent to the render thread will stall until the first command buffer is available again.

This means you do at least know that your resources will not be drawn more than 1 frame later, so rarely-changed resources can be deleted 1 frame after their duplicates have been made and used.

L. Spiro

by **giallanon** » Fri Feb 24, 2012 4:31 pm

L. Spiro wrote:Why do you need extra locks when changing label texts etc.?
The only reason I can see why this would be necessary is if you were submitting a “render this button” command along with a pointer to the button, rather than just submitting the individual API calls needed to render the button.

Indeed, but while I'm building "the individual API calls", I must be sure that noone will change the button text.
At some point I have to access the button pointer to retreive the text; Ok, I won't do this in the main thread (the one that own DX9), because the command list is built in the 2nd thread.
But at this point, If I build all the command lists in the 2nd thread, what's the point of having a main thread?
I can do all the jobs in a single thread and when I'm done with the command list, instead of submitting it to the main thread, just render it in the same thread.
The only advantage I see is that the single thread solution will stall on DX->Present() while a multithread approach will let the 2nd thread run even if DX is busy with rendering.
On the other hand, submitting jobs to the main thread involves a lock on the "submit queue".

Anyway, I'm now taking the multithread approach and will see how it performs, I still have a long run before I can render something

by **L. Spiro** » Sat Feb 25, 2012 4:13 am

The problem is that you have swapped your render thread and your logic thread. The main thread should be handling logic and input and submitting commands to the render-thread’s command list, while the render thread should be the “other” thread.
I didn’t want to say anything at first when you said the render thread was handling Windows messages, but your last example of being able to change text while a command list is being built brings to light a very common pitfall in the way in which input is handled.
Since it can be handled entirely correctly regardless of how threads are distributed, I will only talk about the main important aspect of input handling for now.

The common pitfall I see is that people will set up callbacks to be called when input is detected. For example, they may pass a function pointer to the input manager and when the input manager sees the B button has been hit, the callback is called and Mario jumps.
The problem is that without specific synchronization with the logical thread, this jump could happen at any time.
What if physics were running while Mario is in mid-air? The physics routine temporarily sets Mario’s isOnGround value to true and makes some checks to prove it is false. Since he is not on the ground it eventually gets set to false after a few rays casts etc. (by the way this is not how the game really works and in reality it would be the opposite way around if it worked anything like this at all, but this is just an example).
Next frame it repeats, sets the value to true, and the thread is interrupted by the B button, which finds Mario on the ground even though he isn’t, and suddenly Mario can air-jump 1/1,000th of the time.

In essence, you are trying to avoid something that can’t be avoided. Input handling must be restricted to the first task your logical loop handles each frame, and if input is coming from another thread then you simply can’t avoid the “mess” of critical sections, regardless of how your rendering is set up.
My old engine, for iOS devices, ran both the logic and rendering on one thread, but it was not the main thread. Input from the main thread was simply logged until the next frame when the logical processing began, at which point it was copied over to a logical-thread buffer for handling. Naturally there were locks and unlocks, but you really don’t need to worry about the performance of this part. Even on the earliest iOS devices there was no impact.

Input that is asynchronous with the logical thread will cause problems not just with text in your labels, but in many other potential places.
It is something you need to fix regardless of your rendering method, and luckily it will implicitly fix the problem of being able to change label texts while the label is submitting render commands, since those 2 events would then be on the same thread.

Your options are to either buffer the input until the logical thread is ready, with a few extra locks that would only stall under rare conditions, or to swap the threads so that input is already on the logical thread from the start. It should still be buffered but it doesn’t have to be copied across threads.

Again, you don’t need to fear having input on another thread and all the locks and copies it involves. The fact is, if that thread is devoted only to input (not also rendering), then you can get much smoother input. Since you won’t be waiting for rendering or physics to get the input from the OS, you can always get them immediately and your timestamps for each event will be more accurate. This will give you better curves with the mouse or touch input for iOS.

L. Spiro

by **giallanon** » Sun Feb 26, 2012 9:04 pm

Actually I'm not processing any input in the main loop, I'm just gathering and buffering using a double array.
To make it short, at any time, one array is write only and will store input from mouse, keyb and whatever, while the other array is readonly and it's the one that the logic thread can use to process input.
When the logic thread has finished with the input array and it's ready to process another burst of data, it calls AcquireInput() which send a command to the main thread that, in turn, will swap the 2 buffers and set a flag to let the logic thread know that he can now process the input array.

So the sync problem with the GUI button doesn't exists since the "on mouse over" event is not async from the logic thread point of view.

I was thinking of a more generic situation in which, for example, a loading thread wants to load something and then output a line of text saying "xxx has ben loaded".
Anyway, I guess I simply have to avoid situations like that. The loading thread could load and on finish could send a sort of message to the logic thread saying that he has finished with the load. The logic thread will take care of printing the text

by **L. Spiro** » Tue Feb 28, 2012 12:19 pm

Then I would say you are on the right track. I would keep going forward with the standard second-thread render and just handle the special cases as they arise.

Putting input on its own thread is worth considering, for the reasons mentioned before.

L. Spiro

L. Spiro Engine

Multithreading

Multithreading

Re: Multithreading

Re: Multithreading

Re: Multithreading

Re: Multithreading

Re: Multithreading

Re: Multithreading

Re: Multithreading

Who is online