The Blog

Basic Graphics Implementation


When I first started this site I was originally planning to post about the construction of the engine along the way, what design decisions I made, and why.  Now that I am restarting the graphics portion of the engine it is a good chance to do that.  I won’t cover everything I do for the sake of my own development time (because I am writing these articles as I code), but here-and-there I will cover the important parts that are not often covered anywhere else.



A picture is worth 142,462 bytes.

Organization of LSEngine

The first decision that should be made is how to organize your files and folder structure, including file-name prefixes etc.  I made this decision long ago and have been consistent with it throughout the engine.  Inside of LSGraphicsLib, the LSG prefix means, “L. Spiro Graphics”, obviously.  After that, each module has its main .CPP and .H on the root level, and every other file for that module is no more and no less than 1 folder deep.  The folder tree of the project matches exactly the folder tree on disk.

All headers in the module must first include the module’s main header, which is LSGGraphicsLib.h in this case.  There are a few exceptions (such as LSGDirectX11.h here) but I will not get into that.  After that, additional headers are included alphabetically.


Design Decisions

I tend to avoid globals/statics as often as I can, and considered heavily whether I should make my CDirectX11 class (and the others like it) instance-based or not.  There are plenty of design-based reasons to make it so, but fewer practical reasons.  For the sake of getting things done I don’t want to pass around a device/graphics pointer, and I don’t need to switch API’s at run-time so I don’t need a virtual interface etc. based on which rendering API is active.  I am also fine with only having a limit of 1 graphics device per instance of the application.  There are advantages to being able to have more, but until now I have been able to handle those cases just fine with 1 graphics device, and the fact is that those advantages can sometimes cause more troubles than they fix.  The main cases in which you would need multiple devices is for debugging or for tools.  Debugging can be handled any number of ways and is not performance-intensive, so multiple devices is not really necessary.  On the tools’ side, however, a perfect example of why you might want multiple devices would be to integrate the engine into Qt for cross-platform editors etc.  The problem is that each OpenGL widget in Qt has its own OpenGL context, which you could handle easily if you made one graphics device per context.  But resources in the game engine would then have to be specifically associated with each context.  If I have a character in one window and I want to check its textures in another window, my system will just get the list of CTexture2D pointers on it and try to draw them in the second window.  But that will fail because those textures are not registered with that OpenGL context.  The way around it is that my CTexture2D class would then have to have a mapping between OpenGL contexts and GLuint texture ID’s so it knows which one to activate given the currently active OpenGL context.  It would be similar if you tried to solve the problem in Direct3D by making multiple Direct3D devices.

Frankly, I’d rather solve that problem using small hacks and a single device rather than jumble up engine code with a feature that would only be used in tools.

With it decided that I will only have 1 rendering device and I am fine being limited to that for the sake of performance and ease in making progress in code, the last decision is in how to implement that device.  I never did nor would I ever consider using a singleton.  I decided to just make every member and method of CDirectX11 (and its counterparts for other API’s) purely static.

“What is the point in a class that is purely static?  Why not just use C functions then?”  If they are functionally the same, then there are reasons why a purely static class should be preferred, at least in my case.

  1. Consistency.  If you have your own naming convention, would you code 90% of your game/engine with that convention but then use a different convention for the other 10%?  Why not?  Because it’s not consistent/clean.  Nothing else in my engine is a C function except WinMain(), so why would I suddenly switch to C functions for just 2% of the engine?  It’s functionally equivalent, but sticking to classes makes it consistent as well.
  2. Classes offer better visibility scope.  C hides “protected” functions by declaring them in the .C files themselves, leaving only the public functions declared in the .H file.  You have some visibility protection in C, but you don’t have “private” protection and it again wreaks of inconsistency.  In C, functions declared and defined inside a .C file are “protected”, not “private”.  If another .C file wants to use that secret function it simply has to declare it in its own .C file.  It will be found during linkage.  Then of course there is just the fact that I declare 99% of all methods in header files, so why would I suddenly change it up and put a few in .C/.CPP files?
  3. One of the most practical reasons is really very simple: Declaring globals in C is a pain in the ass.  For all translation units but 1 you need the “extern” keyword.  In C++ you simply declare a member of the class as static and initialize it in the .CPP file.

Structured on Performance
There will be 2 layers in the CDirectX11 (and friends) class: The low-level layer is the last stand before any actual Direct3D 11 calls get made.  No calls to Direct3D 11 can be made anywhere else; it is a direct wrapper around it, but with an important feature for performance: redundant state checking.  It keeps track of the last values it sent to Direct3D 11 and never sends the same value twice.
There are 2 ways to do this, which I will call the “active” way and the “last-minute” way.
The active way looks like this (snippet from Direct3D 9):

	 * Sets the texture wrapping modes for a given slot.
	 * \param _ui32Slot Slot in which to set the texture wrapping modes.
	 * \param _taU The U wrapping mode.
	 * \param _taV The V wrapping mode.
		if ( _taU != m_taWrapU[_ui32Slot] ) {
			m_taWrapU[_ui32Slot] = _taU;
			m_pd3dDevice->SetSamplerState( _ui32Slot, D3DSAMP_ADDRESSU, _taU );
		if ( _taV != m_taWrapV[_ui32Slot] ) {
			m_taWrapV[_ui32Slot] = _taV;
			m_pd3dDevice->SetSamplerState( _ui32Slot, D3DSAMP_ADDRESSV, _taV );

Here, SetSamplerState() is called when the sampler state for a slot changes and only when it changes.  That sounds fine in theory but if in a single draw call the sampler state was already set to X, and then you changed it to Y, and then you changed it back to X before rendering, it would result in a useless change to Y and back.  At first one’s natural reaction might be to think, “Why would you do that?”, but it just so happens that there is a good reason you would do that, explained below.

The last-minute way looks like this:

	 * Activate this texture in a given slot.
	 * \param _ui32Slot Slot in which to place this texture.
	 * \return Returns true if the texture is activated successfully.
	LSBOOL LSE_CALL CDirectX11CubeTexture::Activate( LSUINT32 _ui32Slot ) {
		CDirectX11::m_psrvActiveTextures[_ui32Slot] = m_psrvShaderView;
		return true;

	 * Called just before rendering to allow performing of any final tasks.
	LSVOID LSE_CALL CDirectX11::PreRender() {
		// Set textures.
			LSUINT32 ui32Index = 0UL, ui32Total = 0UL;
			LSUINT32 ui32Max = CStd::Min( LSG_MAX_TEXTURE_UNITS, CFndBase::m_mMetrics.ui32MaxTexSlot );
			for ( LSUINT32 I = 0UL; I < ui32Max; ++I ) {
 				if ( m_psrvActiveTextures[I] != m_psrvLastActiveTextures[I] ) {
 					m_psrvLastActiveTextures[I] = m_psrvActiveTextures[I];
 				else {
 					if ( ui32Total ) {
 						m_pdcContext->PSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] );
						m_pdcContext->VSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] );
					ui32Index = I + 1UL;
					ui32Total = 0UL;
			if ( ui32Total ) {
				m_pdcContext->PSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] );
				m_pdcContext->VSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] );

This method allows CDirectX11::m_psrvActiveTextures to be changed at-will any number of times between renders, but only the final value just before rendering actually matters. Only if the value is different from the last time a render operation was done will any calls to Direct3D 11 be generated. This is the correct way to handle redundant states, and it fixes the problem with the active method in which a state can go from X to Y, then back to X before a render. Why would you want to do that? It allows you to efficiently set a default state between renders, which allows you to create a more stable rendering system in which each object can define only the states it wants to change that are different from the default states. This is how our in-house engine at tri-Ace works, and it helps render stability by ensuring what states are set on each render, rather than allowing them to bleed over into another object’s render call. The only other way to make this type of guarantee is to keep a full set of state changes for each render call, and while no different in terms of performance it is a hassle to maintain and update, and it consumes much more memory. Switching from active to last-minute and to support a default set of states between renders are 2 of the reasons I wanted to start over on my renderer.

The active style does still have a place even in an otherwise last-minute setup. For Direct3D 9 and OpenGL/OpenGL ES, I will be emulating Direct3D 11 (another reason for starting over on the graphics module is because I was previously emulating Direct3D 9 via OpenGL/OpenGL ES and Direct3D 11), which means I will be creating structures that group similar state settings together and then applying them that way. My engine will have a structure that basically matches D3D11_BLEND_DESC, for example, and when I apply this structure all at once I will emulate it in Direct3D 9 by changing each state one-by-one on the structure when it is applied. The structure will be applied via the last-minute method, but each state on the structure will be applied one-by-one via the active method.

Layer 1 is closest to the metal and ensures that no matter what you do you can’t make any unnecessary calls to Direct3D 11. The second layer introduces a translation unit between the engine state identifiers and those of the API. In most cases this translation can be free (done at compile-time). For example LSG_BUFFERS is the engine’s buffer enumerator, and can be translated offline by declaring it differently in each API header.

	 * Buffers.
		LSG_B_COLOR_BUFFER						= (1UL << 16UL),

	 * Buffers.

	 * Buffers.

Some cases however have to be handled with a bit of run-time logic.

	 * Sets depth testing to on or off.
	 * \param _bVal Whether depth testing is enabled or disabled.
		_bVal = _bVal != false;
		if ( m_bDepthTest != _bVal ) {
			m_bDepthTest = _bVal;
			if ( _bVal ) { ::glEnable( GL_DEPTH_TEST ); }
			else { ::glDisable( GL_DEPTH_TEST ); }

Temporary Conclusion

For now I will leave it at that.  I will post my LSGDirectX11.h and LSGDirectX11.cpp files when done, as well as my LSGDirectX9.h and LSGDirectX9.cpp to show how Direct3D 9 can efficiently emulate Direct3D 11.



L. Spiro

About L. Spiro

L. Spiro is a professional actor, programmer, and artist, with a bit of dabbling in music. || [Senior Core Tech Engineer]/[Motion Capture] at Deep Silver Dambuster Studios on: * Homefront: The Revolution * UNANNOUNCED || [Senior Graphics Programmer]/[Motion Capture] at Square Enix on: * Luminous Studio engine * Final Fantasy XV || [R&D Programmer] at tri-Ace on: * Phantasy Star Nova * Star Ocean: Integrity and Faithlessness * Silent Scope: Bone Eater * Danball Senki W || [Programmer] on: * Leisure Suit Larry: Beach Volley * Ghost Recon 2 Online * HOT PXL * 187 Ride or Die * Ready Steady Cook * Tennis Elbow || L. Spiro is currently a GPU performance engineer at Apple Inc. || Hyper-realism (pencil & paper): || Music (live-played classical piano, remixes, and original compositions):

28 Awesome Comments So Far

Don't be a stranger, join the discussion by leaving your own comment
  1. John
    January 16, 2014 at 10:46 AM #

    Im trying to base my engine off d3d11 as well however Im wondering how do you think constant buffers should be handled in gl es 2.0? I can’t seem to find a way that works well with the uniforms in es. Do you think it would be better to emulate uniforms in d3d instead?

    • L. Spiro
      January 17, 2014 at 7:23 AM #

      This is one area where you have to meet in the middle on both API’s. Although I am using constant buffers in Direct3D 11, I am still using an API to access them similar to that of Direct3D 9 and OpenGL/OpenGL ES *.

      In OpenGL, OpenGL ES 2.0, and Direct3D 9 my own shader language extracts the addresses of uniforms and keeps track of their current values. You use an ID provided by my library to access them, but it’s just an array index internally so there is no overhead. You call SetVector() or SetMatrix() and the system checks your new value with the value already set on that uniform, updating the uniform only if it has actually changed (while I have already made this fast, I will improve it later by adding a flag to some that change often and skip the redundancy check).

      In Direct3D 11, the same interface (SetVector(), SetMatrix(), etc.) is used, but the ID the system gives you is actually 2 indices: One for the constant buffer in which the value resides and 1 for its offset inside that buffer. Again, no overhead.

      In this case, I do a redundancy check again, but when a value is found to actually have changed I set a flag on the whole constant buffer. Once that flag is set, redundancy checks for that constant buffer are no longer performed because the whole constant buffer will be remapped and uploaded. Constant buffers not flagged are not remapped.
      This is a very efficient system even though the API to access the constant buffers is still similar to older API’s.

      You don’t need a whole new shader language like the one I have created to implement this.
      You just need the basic API I described to set things quickly and some dirty flags to make sure you never send to the GPU anything you don’t need to send.

      This is a long way of saying it is better to emulate the uniform system of OpenGL ES 2.0, but I wanted to give details on how I actually do it as well.

      L. Spiro

  2. Ivan
    February 28, 2014 at 12:09 AM #

    Very impressive, keep up the good work. Also, i’m very interested in the material system, which supports both forward and deferred shading/lighting, and they can be switched at run-time. Do you have a nice idea about implementing this?Hope you can give some brief details.

  3. spalmer
    June 28, 2014 at 12:01 AM #

    A few years ago when I started learning C/C++, I looked around online to see if using Visual Studio and Visual C++ was a good idea, and if it was different than normal C++. At the time, it seemed like nobody thought it was a good idea, and it should be avoided. After finding and reading your entire site, I came across this page where you’re not only using Visual Studio, you’re using 2008.

    After gaining a huge amount of respect for you after delving into this website filled with probably only a small portion of your knowledge, I had to ask what your opinion was on Visual Studio and Visual C++ as opposed to…well any other way to code in C++. Also, why you’re using 2008 as opposed to any newer ones, out of curiosity mostly.

    • L. Spiro
      June 28, 2014 at 12:24 AM #

      Visual Studio has been the standard on Windows for more than just “a few years ago”, so I don’t know about your sources, but it is true that because of its popularity there are a lot of people who don’t know that some of its features are simply Visual Studio specific extensions of the language, such as allowing a comma after the last value in an enum.
      Whatever hate Visual Studio has garnered over the years is mainly based on this. I have also found virtually universally (even haters admit) that Visual Studio has no equal in regards to debugging. This is probably the single largest reason why it is the de-facto standard on Windows—never underestimate the value of a good debugger.

      All that being said, I also use Xcode on Mac OS X and iOS. Naturally. I have no real beefs with that compiler either, and in fact it has some useful analytical features as well, finding dead stores etc.

      The fact that Visual Studio does not have these is not a down-side, it just isn’t a plus-side. In other words, Visual Studio doesn’t have any notable negative points, it just lacks a few positive points.
      You could argue the same for Xcode, starting from its most recent release. You’d have the ultimate compiler and debugger if you combined the best of both of them.

      As for why I use Visual Studio 2008, the reason is fairly simple: I have a license for the professional version.
      My first memory of Visual Studio was with an express version and at start-up everything I made with it would show a pop-up you would have to close before getting to my product.
      I have never trusted an express version again.

      I use Visual Studio 2010, 2012, and 2013 at work. 2010 takes 10 hours to load and I will never use that for my own purposes. The rest work perfectly fine—I have no problem with them—but I will never use an express edition again and the rest have license fees. I will get a license eventually but right now I don’t need one.

      L. Spiro

      • spalmer
        June 28, 2014 at 2:44 AM #

        Well good to know I’ve been wasting my time doing it the hard way! Good experience anyway I guess so I can’t complain. Thanks for the advice though! I have an ultimate/professional (can’t remember which) 2010 license free from the college I attended, so I’ll be switching to that. You’re right though, compared to the others, it does take forever to load.

  4. s
    July 15, 2014 at 6:05 AM #

    This may be a very stupid question, but I will take my chances.
    How do you load models into the vertex/index buffers? Have you written your own importer?


    • L. Spiro
      July 15, 2014 at 11:58 PM #

      I wrote a converter from FBX to my own file format which has the index and vertex buffers embedded into it.
      Loading from there is trivial.

      L. Spiro

      • Joe
        August 11, 2017 at 9:17 AM #

        Hi L. Spiro,

        I was wondering, did you use the FBX SDK to write your converter or do you have the specification? I have my own loaders with the added stipulation that these libraries are in pure C and don’t rely on any 3rd party libraries whatsoever so I’m trying to work my way around needing the C++ FBX SDK. Are you able to share the spec?



        • L. Spiro
          August 11, 2017 at 9:50 AM #

          I use the FBX SDK. I cannot possibly recommend trying to decode FBX files manually. You will never finish that task not just due to the complexity of the file format itself, but because of how necessary the FBX helper functions are for evaluating scenes and determining final geometry and poses.
          Trying to load an FBX file manually requires almost rewriting a good chunk of Maya. The FBX SDK doesn’t just parse out the already-confusing data from the file, it lets you pick a time in an animation and it will evaluate the entire scene hierarchy at that point, which means traversing the scene graph and for each node evaluating the order of transforms plus interpolating between keyframes using various types of interpolation, from saws to linear to various splines. The helper functions provided by the SDK are not insignificant, and trying to do it all yourself is absolutely guaranteed to give you errors handling special cases that you will have to keep finding and fixing for years to come. You won’t be able to handle all versions of FBX files, and they come in both binary and text.
          It’s not worth spending a ton of time to create something you will never know for-sure works on every FBX file and handles every feature properly.

          L. Spiro

          • Joe
            August 13, 2017 at 12:47 PM #

            Thanks Spiro. I’ve already done this for OpenFlight and Collada formats though so I think your caution seems a bit misplaced. Again, I can’t use the FBX SDK with a pure C application. Moreover, it isn’t necessary to handle all formats and features in existence, just those generated with tested versions of major modeling tools within the industry and the features needed for rendering. One can’t even count on the FBX SDK to give similar results across separate versions of the SDK and version of these tools (like Maya), same with Collada, so why wouldn’t rolling your own make sense? Why everyone is comfortable using an industry standard format without a published spec is beyond me. When you have to use a 3rd party library such as FBX SDK, you are essentially expending the energy to learn the spec anyway, just not how the data in the source format is actually packed.

  5. Dzmitry
    August 21, 2014 at 5:01 AM #

    Have you looked at Rust yet? You seem to be caring about memory safety and engine architecture, so you may find it “just right” for your needs.

    We’ve got an API abstraction layer in gfx-rs, and thinking about higher-level interfaces now. Your feedback (or participation?) would be highly appreciated.

  6. a5630098
    December 27, 2014 at 5:51 PM #


    • L. Spiro
      December 28, 2014 at 10:39 AM #

      I am the author of MHS yes, but this project is not related to MHS.

      L. Spiro

      • a5630098
        January 2, 2015 at 10:46 PM #

        Thank you very much.

        I wish you a happy New Year!

      • a5630098
        January 3, 2015 at 9:48 AM #


  7. 0xD06C
    October 8, 2015 at 1:24 PM #

    Hi, Spiro

    Do you have an example about how do you declare the static variable to obtain the directx device??

    I have this one:
    #define D3D9DEVICE D3D9Render::Instance()->GetDevice()

    Where the method Instance is static and returns a pointer to D3DRender class

    Is it correct this?? I want to avoid the singletons.

    How did you code??

    Thanks advice

    • L. Spiro
      October 8, 2015 at 10:37 PM #

      I only post this after making my reasons for my design clear. They are suitable for me after much thought. It doesn’t mean it is the best design for others, and each person should consider that carefully.

      That being said, I do this:

      	 * Class CDirectX9
      	 * \brief Interface with the DirectX 9 API.
      	 * Description: Interface with the DirectX 9 API.
      	class CDirectX9 : public CGfxBase {
      		 * Gets the DirectX 9 device.
      		 * \return Returns the DirectX 9 device.
      		static LSE_INLINE IDirect3DDevice9 *	GetDirectX9Device();
      		 * Gets the DirectX 9 object.
      		 * \return Returns the DirectX 9 object.
      		static LSE_INLINE IDirect3D9 *		GetDirectX9Object();
      	protected :
      		// == Members.
      		/** The DirectX 9 device. */
      		static IDirect3DDevice9 *		m_pd3dDevice;
      		/** The DirectX 9 object. */
      		static IDirect3D9 *			m_d3dObject;

      In other files:

      		if ( _ui32TotalPrimitives ) {
      			CDirectX9::GetDirectX9Device()->DrawPrimitive( static_cast<D3DPRIMITIVETYPE>(m_ui32Topology),
      				_ui32TotalPrimitives );

      L. Spiro

  8. MattMatt
    January 1, 2017 at 7:16 AM #

    Hey Spiro

    How did you exactly implement context creation with windows? Do you first create a context and attach to a window or does the window create the context? Are you able to handle multiple windows with your method? Right now, my method looks something like this:

    cnd::Root root;
    auto window1 = root.CreateWindow(WindowParameters(“Window 1″));
    auto surface = window.CreateRenderSurface(cnd::RenderAPI::OPENGL_4);


    auto gpuFeatures = surface->GetGPUFeatures();
    //etc …

    // Create another window:
    auto window2 = root.CreateWindow(WindowParameters(“Window 2″));

    • L. Spiro
      January 1, 2017 at 8:51 AM #

      Initialization is broken into 3 parts: Basic initialization, window initialization, and post-window (secondary) initialization.
      For most builds, there is a macro break-down after the window creation to pass what is needed from the window to initialize the graphics API, but in OpenGL on Windows the secondary initialization is merged into window creation.
      case WM_CREATE : {
      pwThis->SetHdc( ::GetDC( _hWnd ) );
      #ifdef LSG_OGL
      // Attach OpenGL.
      CGraphicsLib::InitGraphicsLibApi( pwThis->GetHdc() );
      #endif // #ifdef LSG_OGL

      For the main engine, there is no support on any build for multiple windows. This feature is only available on tool builds, and in the case of OpenGL I use Qt, so I allow them to create shared contexts for use in multiple windows.

      L. Spiro

  9. MattMatt
    January 2, 2017 at 7:36 AM #

    Thanks for answering :) Also I read that you separated d3d11/d3d9/opengl builds and distributed several executables. Is this a viable option? Isn’t hard drive space an issue? Do you have some kind of a launcher to select the API before runtime?

    • L. Spiro
      January 2, 2017 at 7:40 AM #

      This is the standard way to do it, as executable sizes are not the issue, the data is. A launcher should be made to handle the details of selecting the executable to run.
      But there are fewer executables than you seem to expect, as you would never ship an OpenGL build for Windows®, only DirectX 11 and DirectX 12.

      L. Spiro

      • MattMatt
        January 2, 2017 at 8:17 AM #

        What about OpenGL on linux/mac? Do you have several openGL executables like openGL4.exe, openGL3.exe?

        • L. Spiro
          January 2, 2017 at 8:19 AM #

          Those platforms are not different enough to warrant macro spaghetti for multiple platforms. A single executable that can downgrade to OpenGL 3.3 at run-time is enough. OpenGL, with its extensions-based implementation, should be built to handle fallbacks anyway, which is why it is a dying API.

          L. Spiro

          • MattMatt
            January 2, 2017 at 8:40 AM #

            I agree with you about openGL. What do you think about vulkan though? Valve claimed that they don’t need a dx backend anymore. . .

          • L. Spiro
            January 2, 2017 at 7:56 PM #

            It wouldn’t be wise to ditch DirectX on Windows, at least not yet, but it currently seems feasible if you really want to.

            L. Spiro

          • Uhlan
            May 31, 2017 at 8:15 PM #

            What do you mean by “which is why it is a dying API” Please elaborate.

          • L. Spiro
            June 1, 2017 at 10:52 AM #

            Being extension-based means developers have to code tons of fallbacks or alternative paths based on what features are available at run-time. It’s burdensome on developers and prone to bugs.

Leave a Comment

Remember to play nicely folks, nobody likes a troll.